[torqueusers] pbs_mom request, was Re: PBS_MOM kills running jobs when restarted
Wendy Lin
HCLin at lbl.gov
Thu Dec 17 08:48:12 MST 2009
>> Unfortunately, at least with the version of Torque we use, not only
>> -p is the default but also there is no way (that I know of) to get
>> back the original default behavior, i.e. don't do anything about
>> previous jobs when it starts, leave it to the server to decide
>> whether to purge or rerun them. I have tried the "-q" setting, it
>> did not do any better
>
> From my reading of your description you are saying there is
> currently no way for pbs_mom to start up, mark the jobs as no longer
> running, and for them to be rescheduled if they are marked as
> "rerunable".
I believe all it takes is that pbs_mom not trying to be smart about ex-
children, and not communicating with the batch server about them. The
server is designed to handle the situation.
> There should be a way to do this - and it shouldn't matter if
> pbs_mom started at boot time automatically, or manually by some
> admin later.
Yes, and I think diverting the discussion of "pbs_mon -p" to whether
to start pbs_mom at boot time downplays the importance of this new
annoying default behavior issue. I'd like it better if the subject
line had changed to something else for that discussion.
> Isn't that what the rerunable flag is for? Is this a 2.4.x only issue?
We are running 2.4.1b, I don't know about other versions. We got it
from Cray, so we'll open a Bug with Cray.
--
Wendy Lin
hclin at lbl.gov
More information about the torqueusers
mailing list