[torquedev] Is kill_delay broken?

Michael Barnes barnes at jlab.org
Tue Mar 31 14:44:35 MDT 2009


On Tue, Mar 31, 2009 at 01:13:42PM -0600, Josh Butikofer wrote:
> Everyone,
> 
> We've had a customer report a possible regression in how the pbs_server
> attribute "kill_delay" is supposed to work. This is what the man page says 
> about it:
> 
> kill_delay
> The amount of the time delay between the  sending  of  SIGTERM  and
>     SIGKILL  when a qdel command is issued against a running job.  This
>          is overriden by the execution queue attribute  of  the  same  name.
> Format: integer seconds; default value: 2 seconds.
> 
> In other words, kill_delay controls when the pbs_server sends a SIGKILL to a
> job. For example, when qdel is used on a running job, the pbs_server sends a
> SIGTERM to the job immediately. The server then adds an internal task to 
> send
> the SIGKILL, but puts a time on it <kill_delay> seconds in the future.
> 
> When the MOM gets the SIGTERM request, it passes that signal on to all of 
> the
> tasks in the job's session. For example, our typical test job has three 
> tasks in the job's session:
> 
> root     11147     1  0 Mar20 ?        00:01:08 pbs_mom
> ...
> josh     26482 11147  0 12:59 ?        00:00:00 -bash
> josh     26483 26482  0 12:59 ?        00:00:00 -bash
> josh     26484 26483  0 12:59 ?        00:00:00 /home/josh/sigtest
> 
> 
> The sigtest task/process has a handler to catch and ignore the SIGTERM, but 
> that
> is not true for bash. This means bash is killed immediately.
>
> Next, the MOM runs scan_for_terminated() then sees the -bash task terminate 
> and
> then does several things, one of which is to call kill_task() with a 
> SIGKILL.
> Kill_task then issues a SIGKILL for any pid that is still in the /proc 
> table and
> matches the session ID. This then kills sigtest *early*. In other words,
> kill_delay is subverted because the pbs_mom sends a SIGKILL before the 
> server
> tells it to. This seems to make kill_delay, well, useless. :)
> 
> Does anyone out there know if this is a regression? In an effort to make the
> pbs_mom more tidy, did we inadvertently break kill_delay's intended
> functionality? Or am I perhaps missing something? Are there cluster admins 
> out
> there that use kill_delay successfully?
> 
> BTW, this test was done in TORQUE 2.3.x on Linux.

AFAIK, the session ID is the process ID of the program that the pbs_mom
runs on behalf of the user, and the pbs_mom does not walk down the
process tree beyond the sesson ID when it sends a signal that ID.

So, in the above example we have pbs_mom (11147) -> bash (26482) -> bash
(26483) -> sigtest (26484)

When pbs_mom sends a TERM signal to 26482, the signal is sent to all of
its children as well.  So, the bash process being killed makes sense.
And then sigtest would be a stray process, which is a common problem on
clusters.  That is why there is a delay between TERM and KILL, because
KILL cannot be trapped or passed down the process group, and once a
process is KILLed, the child processes are now not under the pbs_mom's
control, but under init's control.

It would be great if the pbs_mom kept better control of all of its
subprocesses, but I can't think of an efficient and reliable way of
doing this.  (Common delema for moms)

With the kill delay, I've tested this on older TORQUE versions, and it
worked fine.  I believe its also in the mom logs.  It says something
like sending TERM signal, then sending KILL signal.

-mb

-- 
+-----------------------------------------------
| Michael Barnes
|
| Thomas Jefferson National Accelerator Facility
| 12000 Jefferson Ave.
| Newport News, VA 23606
| (757) 269-7634
+-----------------------------------------------


More information about the torquedev mailing list