[torqueusers] RE: Health check script failure and offlining
Smith, Jerry Don II
jdsmit at sandia.gov
Sat Dec 3 12:53:55 MST 2005
Richard,
We wrote a cron script tht takes care of this. But yes MOAB takes care of this all on its own, even allowing "triggers" to adjust many things (node state, reservations etc...).
Jerry
All,
I have set up a health check script in $PBS/mom_priv/config. It works
fine in that it sets the 'message' attribute for the problem mom/node when
there is a failure, but how can I get the nodes status adjusted to
'offline'
(pbsnodes -o nodeXXX) when the failure occurs. The manual says that:
"Cluster schedulers can be configured to adjust a given node's state
based on this [ERROR message] information."
Perhaps this is only a MOAB feature.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 2962 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051203/dd0a4796/attachment.bin
More information about the torqueusers
mailing list