[torquedev] Is kill_delay broken?
barnes at jlab.org
Tue Mar 31 14:44:35 MDT 2009
On Tue, Mar 31, 2009 at 01:13:42PM -0600, Josh Butikofer wrote:
> We've had a customer report a possible regression in how the pbs_server
> attribute "kill_delay" is supposed to work. This is what the man page says
> about it:
> The amount of the time delay between the sending of SIGTERM and
> SIGKILL when a qdel command is issued against a running job. This
> is overriden by the execution queue attribute of the same name.
> Format: integer seconds; default value: 2 seconds.
> In other words, kill_delay controls when the pbs_server sends a SIGKILL to a
> job. For example, when qdel is used on a running job, the pbs_server sends a
> SIGTERM to the job immediately. The server then adds an internal task to
> the SIGKILL, but puts a time on it <kill_delay> seconds in the future.
> When the MOM gets the SIGTERM request, it passes that signal on to all of
> tasks in the job's session. For example, our typical test job has three
> tasks in the job's session:
> root 11147 1 0 Mar20 ? 00:01:08 pbs_mom
> josh 26482 11147 0 12:59 ? 00:00:00 -bash
> josh 26483 26482 0 12:59 ? 00:00:00 -bash
> josh 26484 26483 0 12:59 ? 00:00:00 /home/josh/sigtest
> The sigtest task/process has a handler to catch and ignore the SIGTERM, but
> is not true for bash. This means bash is killed immediately.
> Next, the MOM runs scan_for_terminated() then sees the -bash task terminate
> then does several things, one of which is to call kill_task() with a
> Kill_task then issues a SIGKILL for any pid that is still in the /proc
> table and
> matches the session ID. This then kills sigtest *early*. In other words,
> kill_delay is subverted because the pbs_mom sends a SIGKILL before the
> tells it to. This seems to make kill_delay, well, useless. :)
> Does anyone out there know if this is a regression? In an effort to make the
> pbs_mom more tidy, did we inadvertently break kill_delay's intended
> functionality? Or am I perhaps missing something? Are there cluster admins
> there that use kill_delay successfully?
> BTW, this test was done in TORQUE 2.3.x on Linux.
AFAIK, the session ID is the process ID of the program that the pbs_mom
runs on behalf of the user, and the pbs_mom does not walk down the
process tree beyond the sesson ID when it sends a signal that ID.
So, in the above example we have pbs_mom (11147) -> bash (26482) -> bash
(26483) -> sigtest (26484)
When pbs_mom sends a TERM signal to 26482, the signal is sent to all of
its children as well. So, the bash process being killed makes sense.
And then sigtest would be a stray process, which is a common problem on
clusters. That is why there is a delay between TERM and KILL, because
KILL cannot be trapped or passed down the process group, and once a
process is KILLed, the child processes are now not under the pbs_mom's
control, but under init's control.
It would be great if the pbs_mom kept better control of all of its
subprocesses, but I can't think of an efficient and reliable way of
doing this. (Common delema for moms)
With the kill delay, I've tested this on older TORQUE versions, and it
worked fine. I believe its also in the mom logs. It says something
like sending TERM signal, then sending KILL signal.
| Michael Barnes
| Thomas Jefferson National Accelerator Facility
| 12000 Jefferson Ave.
| Newport News, VA 23606
| (757) 269-7634
More information about the torquedev