[torqueusers] Job Nanny Poll
stevejones at stanford.edu
Mon Nov 21 12:22:00 MST 2011
----- Original Message -----
> On Mon, Nov 21, 2011 at 2:06 PM, David Beer
> <dbeer at adaptivecomputing.com> wrote:
> > All,
> > Just a quick poll question - do people use the job delete nanny
> > functionality in TORQUE? If you do, in qmgr you would have the line:
> > set job_nanny = True
> > I'm curious how many people are using it - this seems like very
> > repetitive functionality to me (pbs_mom does pretty much the same
> > thing already) and I personally think job_force_cancel_time is
> > better, but I may be biased.
> I use the job delete nanny, but I am not familiar with
> job_force_cancel_time. I have been using the job delete nanny for a
> long time.
> What exactly does it do? I presume some of the multi-threading in
> pbs_server in TORQUE 4.0 can clean up some of this code a little since
> pbs_server can spawn a thread to hang out and manage the job delete
> (rather than needing to set a work task to check the status of the
> delete in the future)
I'm also using job_nanny, this is the first I've heard of job_force_cancel_time. Following a quick search it looks like it might have been undocumented for a short period. So it takes an int but there's not a recommended value, maybe 300? Is this the automation of 'qdel -p [jobid]' or similar for jobs *stuck* when a node stops responding?
More information about the torqueusers