[torqueusers] Torque behavior with failed nodes

Pradeep Padala ppadala at eecs.umich.edu
Thu Jul 28 20:51:59 MDT 2005


Hi,
    I am trying to understand Torque's behavior when a node fails. I am 
checking the source, and I understand that check_nodes marks the node as 
down by setting the node state to INUSE_DOWN, but I don't see any code 
to move the jobs to somewhere else. What happens to the jobs running on 
that node? Will the scheduler be told about the failed node?

    Any input is greatly appreciated.

Thanks,
-- 
Pradeep Padala
http://ppadala.blogspot.com


More information about the torqueusers mailing list