[torqueusers] Sharing your Compute Node Health Check scripts

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Mon Aug 23 09:00:04 MDT 2010


Can anyone implementing a node_check_script kindly share their know-how
with us and/or the list ?

We would like to implement the Torque Compute Node Health Check script feature in
http://www.clusterresources.com/torquedocs21/10.2healthcheck.shtml

How do people check for health problems such as:
* various ways that disk failures can cripple a file system or a swap partition
* RAM memory errors
* out-of-memory conditions
* disk full conditions
* other stuff?

I've asked the list about this before, but received zero responses :-(
I hope for better luck this time...

Thanks a lot,
Ole

Ole Holm Nielsen
Department of Physics, Technical University of Denmark


More information about the torqueusers mailing list