[torquedev] pbs_mom -p and rerunnable jobs

Wendy Lin HCLin at lbl.gov
Tue Jan 12 09:17:32 MST 2010


>> If pbs_mom is restarted with the -q option and there are no jobs  
>> still running pbs_server will re-queue the jobs so they can be run  
>> again later.  If jobs are still running when pbs_mom is started  
>> with -q, the jobs are terminated, go to an E state but are then re- 
>> queued so they can be run again.


I briefly tested the -q, it did not get me what I wanted. If you  
believe -q will allow rerunable jobs to run again after the MOM node  
gets rebooted after a crash, I'll redo my test.


>> What was the default behavior previously?


 From Torque 2.3.7 man page:


----
Normally the mini-server is started from the system boot file without  
the -p or the -r  option. The mini-server will make no attempt to  
signal the  former  session of any job which may have been running  
when the mini-server terminated. It is assumed that on reboot,  all  
processes have been killed.  The MOM will  mark  the  jobs  as  
terminated  and  notify  the  batch server which owns the job.
----

 From OpenPBS man page:

----
Normally the mini-server is started from the system boot file without  
the -p or the -r option. The mini-server will make no attempt to  
signal the former session of any job which may have been running when  
the mini-server terminated. It is assumed that on reboot, all  
processes have been killed.
----

Looks like Torque added the MOM's sending obit part, which I believe  
is the problem.

-- 
Wendy Lin
hclin at lbl.gov




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20100112/31db7fa1/attachment.html 


More information about the torquedev mailing list