[torqueusers] Question About Desired Behavior

David Beer dbeer at adaptivecomputing.com
Tue Mar 26 12:14:15 MDT 2013


On Tue, Mar 26, 2013 at 11:33 AM, Glen Beane <glen.beane at gmail.com> wrote:

> On Tue, Mar 26, 2013 at 12:41 PM, David Beer
> <dbeer at adaptivecomputing.com> wrote:
> > All,
> >
> > Our QA tests have exposed that when a job file is loaded saying that it's
> > state is running but there is no exec host list defined we don't handle
> this
> > state, that is, we attempt to perform actions on the job that assume it
> is
> > running, but we can't talk to the mom because we don't know what mom it
> is.
> > I can think of two different behaviors:
> >
> > 1. delete the job
> > 2. requeue the job
> >
> > Which one would you all prefer?
>
>
> how does a job get into this state in the first place?
>

At this point it appears to be a corrupted job file. More than that we
don't know, but we need to handle this.

David


> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130326/57a46b4a/attachment.html 


More information about the torqueusers mailing list