[torqueusers] Question About Desired Behavior
dbeer at adaptivecomputing.com
Tue Mar 26 12:14:15 MDT 2013
On Tue, Mar 26, 2013 at 11:33 AM, Glen Beane <glen.beane at gmail.com> wrote:
> On Tue, Mar 26, 2013 at 12:41 PM, David Beer
> <dbeer at adaptivecomputing.com> wrote:
> > All,
> > Our QA tests have exposed that when a job file is loaded saying that it's
> > state is running but there is no exec host list defined we don't handle
> > state, that is, we attempt to perform actions on the job that assume it
> > running, but we can't talk to the mom because we don't know what mom it
> > I can think of two different behaviors:
> > 1. delete the job
> > 2. requeue the job
> > Which one would you all prefer?
> how does a job get into this state in the first place?
At this point it appears to be a corrupted job file. More than that we
don't know, but we need to handle this.
> torqueusers mailing list
> torqueusers at supercluster.org
David Beer | Senior Software Engineer
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers