[torqueusers] node_check_script with node_check_interval=jobstart only executed on first node

Thomas Zeiser thomas.zeiser at rrze.uni-erlangen.de
Wed Nov 27 04:49:01 MST 2013


Hello,

we have in our mom_priv/config
  $node_check_script /var/spool/torque/mom_priv/health-check.sh
  $node_check_interval 0,jobstart

However, it looks like the health-chck script is only executed on
the first node of a multi-node job, i.e. only on the node with the
Mother Superior. I would expect it to be run at (before) jobstart
on EVERY node of the job.


Moreover, there are some inconsistencies in the documentation:
1) http://docs.adaptivecomputing.com/torque/Content/topics/commands/pbs_mom.htm
   "The message (up to 256 characters) immediately following the
   Error string"
2) http://docs.adaptivecomputing.com/torque/Content/topics/11-troubleshooting/creatingHealthCheckScript.htm
   "The message (up to 1024 characters) immediately following the
   ERROR keyword"
=> "Error" vs. "ERROR"; length 256 vs. 1024 characters


Best,

Thomas


More information about the torqueusers mailing list