[torqueusers] Torque/maui node failure policy
garrick at usc.edu
Mon Jun 18 18:30:35 MDT 2007
On Mon, Jun 18, 2007 at 05:28:27PM -0700, Peter Wyckoff alleged:
> I want to configure torque in such a way that if any node other than the
> node running pbsdsh (the head node?) fails, do __NOTHING__ - don't cancel
> the job or re-run it or anything.
> My code handles all failures other than the 1st node failing.
> Is there a way to configure torque to do nothing other than the head node?
> Or do nothing no matter what ? (since head node failures should be rare as
> opposed to other nodes).
TORQUE doesn't cancel jobs when sister nodes go down. You might be
seeing Maui do that, it has a 5 minute job delete hardwired in there.
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Please avoid sending me Word or PowerPoint attachments.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070618/42a73ebc/attachment.bin
More information about the torqueusers