[torqueusers] Torque/maui node failure policy

Garrick Staples garrick at usc.edu
Mon Jun 18 18:30:35 MDT 2007


On Mon, Jun 18, 2007 at 05:28:27PM -0700, Peter Wyckoff alleged:
> 
> Hi,
> 
> I want to configure torque in such a way that if any node other than the
> node running pbsdsh (the head node?) fails, do __NOTHING__  - don't cancel
> the job or re-run it or anything.
> 
> My code handles all failures other than the 1st node failing.
> 
> Is there a way to configure torque to do nothing other than the head node?
> Or do nothing no matter what ? (since head node failures should be rare as
> opposed to other nodes).

TORQUE doesn't cancel jobs when sister nodes go down.  You might be
seeing Maui do that, it has a 5 minute job delete hardwired in there.

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070618/42a73ebc/attachment.bin


More information about the torqueusers mailing list