[torquedev] pbs_mom -p and rerunnable jobs
Wendy Lin
HCLin at lbl.gov
Tue Jan 12 09:17:32 MST 2010
>> If pbs_mom is restarted with the -q option and there are no jobs
>> still running pbs_server will re-queue the jobs so they can be run
>> again later. If jobs are still running when pbs_mom is started
>> with -q, the jobs are terminated, go to an E state but are then re-
>> queued so they can be run again.
I briefly tested the -q, it did not get me what I wanted. If you
believe -q will allow rerunable jobs to run again after the MOM node
gets rebooted after a crash, I'll redo my test.
>> What was the default behavior previously?
From Torque 2.3.7 man page:
----
Normally the mini-server is started from the system boot file without
the -p or the -r option. The mini-server will make no attempt to
signal the former session of any job which may have been running
when the mini-server terminated. It is assumed that on reboot, all
processes have been killed. The MOM will mark the jobs as
terminated and notify the batch server which owns the job.
----
From OpenPBS man page:
----
Normally the mini-server is started from the system boot file without
the -p or the -r option. The mini-server will make no attempt to
signal the former session of any job which may have been running when
the mini-server terminated. It is assumed that on reboot, all
processes have been killed.
----
Looks like Torque added the MOM's sending obit part, which I believe
is the problem.
--
Wendy Lin
hclin at lbl.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20100112/31db7fa1/attachment.html
More information about the torquedev
mailing list