[torqueusers] Pulling failed nodes based on the launched process' exit code

Garrick Staples garrick at usc.edu
Wed Sep 5 15:08:40 MDT 2007


On Wed, Sep 05, 2007 at 01:21:10PM -0700, Peter Wyckoff alleged:
> 3. based on the exit code of the process/node, I would like to potentially
> be able to mark a box offline or something like -n "suspicious" or
> something. And in an advanced world only do this if the last X jobs failed
> on this node or X out of Y.
> 
> Is there any capability in torque to do #3?  I could probably do it with a
> wrapper around the process running on each node. Something like:

Have the health check script grep the mom log and possibly return an error?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070905/8bd62c8e/attachment.bin


More information about the torqueusers mailing list