[torqueusers] Re: kill_delay
garrick at clusterresources.com
Sun Feb 25 18:51:07 MST 2007
On Sun, Feb 25, 2007 at 12:07:49AM +0100, Roy Dragseth alleged:
> This is hardcoded into src/resmom/linux/mom_mach.c around line 1910. The kill
> procedure iterates up to 20 times with a nanosleep call of 0.25 seconds
> before it continues and does a hard kill on the process. I do not know why
> this have to be there, the -TERM and -KILL signalling should be left at the
> discression of pbs_server which has the kill_delay variable. I do not think
> the kill_delay variable is forwarded to the moms.
That loop has always really bugged me. If you do something largish
with TM, like launch >1000 tasks, that loop takes forever to complete.
I don't know exactly why or when it was added, but OpenPBS didn't have
it. I can easily imagine someone was trying to make pbs_mom very
thorough in the art of process massacre.
That said, kill_delay does actually work correctly. pbs_server will
send out KILLs after kill_delay seconds if the job hasn't exited.
More information about the torqueusers