[torqueusers] Torque behavior with failed nodes
Pradeep Padala
ppadala at eecs.umich.edu
Thu Jul 28 20:51:59 MDT 2005
Hi,
I am trying to understand Torque's behavior when a node fails. I am
checking the source, and I understand that check_nodes marks the node as
down by setting the node state to INUSE_DOWN, but I don't see any code
to move the jobs to somewhere else. What happens to the jobs running on
that node? Will the scheduler be told about the failed node?
Any input is greatly appreciated.
Thanks,
--
Pradeep Padala
http://ppadala.blogspot.com
More information about the torqueusers
mailing list