[torqueusers] Reque if node down?
brett at vpac.org
Thu Aug 12 17:28:35 MDT 2010
----- "Joshua Bernstein" <jbernstein at penguincomputing.com> wrote:
> Hi Simon,
> In my experience I've found that the job won't actually get properly
> requeued until the downed node comes back up again and reports the job
> is dead. Only then will pbs_server requeue it. I don't think there is
> option to auto-requeue after some period of time.
And indeed, this could cause problems, if it is the job that is killing the node.
Job spawns, kills node, node goes down.
Schedules notices, requeues job on another node.
Repeat until entire cluster is down.
Brett Pemberton - VPAC HPC Team Leader
http://www.vpac.org/ - (03) 9925 4899
More information about the torqueusers