[torqueusers] jobs stuck in "W" state

David Golden dgolden at cp.dias.ie
Thu Feb 23 10:17:43 MST 2006

On 2006-02-23 16:41:45 +0000, David Golden wrote> 
> If file stage-in fails at job start, the job is postponed for
> half an hour and an email sent to the user
> rather than the job being totally removed from the queue.
> But once exec_host becomes set, it just keeps trying the
> same node again (at least for torque-1.2.0p6). ISTR discussions
> of more flexible behaviour a while back.

Self-replying with refs:

The stage-in behaviour was actually mentioned days ago on-list:

Would be nice (tm) if there was an option to simply
have the job rejected if stage-in fails.  

Related to the "sticky exec_host" thing:

More information about the torqueusers mailing list