[Mauiusers] Reallocating jobs when a node crashes
Nathalia Garces Ferreira
n.garces26 at uniandes.edu.co
Wed May 23 08:47:11 MDT 2012
Good day all,
I have install torque/maui for an opportunistic grid based on gLite 3.2. I
have some problems when a node that is running a job crashes. After a few
moments it seems that PBS realizes that the node is down and that the job
cant continue BUT it doesn´t re-launch it. The job still appears like
Running of the down resource!!! I´ve tried to re-launch or reallocate it
manually but the scheduler doesn´t let me. When I cancel it by force the
jobs stays between the state Deffered or BatchHold and no releasehold or
qrls command changes it.
I´ve tried to find a way to configure via Torque or MAUI the parameter to
relaunch a job that is allocated on a down resource but I only found MOAB
parameters.
Can you help me out? I would appreciate very much your help,
Nathalia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20120523/f4051c38/attachment-0001.html
More information about the mauiusers
mailing list