[torqueusers] Pulling failed nodes based on the launched
process' exit code
garrick at usc.edu
Wed Sep 5 15:08:40 MDT 2007
On Wed, Sep 05, 2007 at 01:21:10PM -0700, Peter Wyckoff alleged:
> 3. based on the exit code of the process/node, I would like to potentially
> be able to mark a box offline or something like -n "suspicious" or
> something. And in an advanced world only do this if the last X jobs failed
> on this node or X out of Y.
> Is there any capability in torque to do #3? I could probably do it with a
> wrapper around the process running on each node. Something like:
Have the health check script grep the mom log and possibly return an error?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070905/8bd62c8e/attachment.bin
More information about the torqueusers