[torqueusers] pbs_mom request, was Re: PBS_MOM kills running jobs when restarted

Wendy Lin HCLin at lbl.gov
Thu Dec 17 08:48:12 MST 2009


>> Unfortunately, at least with the version of Torque we use, not only  
>> -p  is the default but also there is no way (that I know of) to get  
>> back  the original default behavior, i.e. don't do anything about  
>> previous  jobs when it starts, leave it to the server to decide  
>> whether to purge  or rerun them. I have tried the "-q" setting, it  
>> did not do any better
>
> From my reading of your description you are saying there is  
> currently no way for pbs_mom to start up, mark the jobs as no longer  
> running, and for them to be rescheduled if they are marked as  
> "rerunable".


I believe all it takes is that pbs_mom not trying to be smart about ex- 
children, and not communicating with the batch server about them.  The  
server is designed to handle the situation.


> There should be a way to do this - and it shouldn't matter if  
> pbs_mom started at boot time automatically, or manually by some  
> admin later.


Yes, and I think diverting the discussion of "pbs_mon -p" to whether  
to start pbs_mom at boot time downplays the importance of this new  
annoying default behavior issue. I'd like it better if the subject  
line had changed to something else for that discussion.


> Isn't that what the rerunable flag is for? Is this a 2.4.x only issue?


We are running 2.4.1b, I don't know about other versions. We got it  
from Cray, so we'll open a Bug with Cray.

-- 
Wendy Lin
hclin at lbl.gov






More information about the torqueusers mailing list