[torqueusers] reply code=15001...
garrick at usc.edu
Thu Oct 25 13:36:03 MDT 2007
On Thu, Oct 25, 2007 at 01:12:32PM -0400, nathaniel.x.woody at gsk.com alleged:
> Huh, to follow up on this, what are the rare Bad Things that can happen
> here (I decided years ago to ignore the millions of these we get)?
Since maui/moab is temporarily setting the nodes request to the full nodelist
(replacing it with the original request after the job start), failed job starts
can leave the job tied to specific nodes instead of simply being retried on
other nodes. The worst case is a node going down during the job start leaving
the job impossible to run.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20071025/91d9425b/attachment.bin
More information about the torqueusers