[torqueusers] RE: Health check script failure and offlining

Smith, Jerry Don II jdsmit at sandia.gov
Sat Dec 3 12:53:55 MST 2005


Richard,


We wrote a cron script tht takes care of this.  But yes MOAB takes care of this all on its own, even allowing "triggers" to adjust many things (node state, reservations etc...).


Jerry


All,

I have set up a health check script in $PBS/mom_priv/config.  It works
fine in that it sets the 'message' attribute for the problem mom/node when
there is a failure, but how can I get the nodes status adjusted to
'offline'
(pbsnodes -o nodeXXX) when the failure occurs.  The manual says that:

  "Cluster schedulers can be configured to adjust a given node's state
   based on this [ERROR message] information."

Perhaps this is only a MOAB feature.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 2962 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051203/dd0a4796/attachment.bin


More information about the torqueusers mailing list