[torqueusers] Torque behavior with failed nodes
ppadala at eecs.umich.edu
Thu Jul 28 20:51:59 MDT 2005
I am trying to understand Torque's behavior when a node fails. I am
checking the source, and I understand that check_nodes marks the node as
down by setting the node state to INUSE_DOWN, but I don't see any code
to move the jobs to somewhere else. What happens to the jobs running on
that node? Will the scheduler be told about the failed node?
Any input is greatly appreciated.
More information about the torqueusers