[torqueusers] Sharing your Compute Node Health Check scripts
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Mon Aug 23 09:00:04 MDT 2010
Can anyone implementing a node_check_script kindly share their know-how
with us and/or the list ?
We would like to implement the Torque Compute Node Health Check script feature in
http://www.clusterresources.com/torquedocs21/10.2healthcheck.shtml
How do people check for health problems such as:
* various ways that disk failures can cripple a file system or a swap partition
* RAM memory errors
* out-of-memory conditions
* disk full conditions
* other stuff?
I've asked the list about this before, but received zero responses :-(
I hope for better luck this time...
Thanks a lot,
Ole
Ole Holm Nielsen
Department of Physics, Technical University of Denmark
More information about the torqueusers
mailing list