[torqueusers] Handling of double-fork-and-kill detached processes

David Singleton David.Singleton at anu.edu.au
Thu Mar 3 13:51:02 MST 2005


Ian Stokes-Rees wrote:
> How does Torque deal with processes which detach from their parent 
> process via the common "double fork and kill" technique?  I'm just 
> wondering if it is possible for users to start a process which then 
> sticks around even when the original process group has been killed.  We 
> seem to be having this problem with a current cluster and are wondering 
> if Torque does anything "auto-magically" to catch these processes and 
> kill them.
> 

PBS, and I presume torque, use process session ids to identify
processes in a job (actually identifies the individual tasks of
a job).  A simple "fork, fork and exit daemonizing" code like the
following that does not change session id is OK.

     if ( !fork() )  {
	if ( fork() ) exit(0);
         else sleep(60);
     }
     else
	sleep(60);

  > ps -fj
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
dbs900    1744  1742  1744  1744  0 07:28 pts/6    00:00:00 -tcsh
dbs900    1765  1744  1765  1744  0 07:28 pts/6    00:00:00 ./a.out
dbs900    1766  1765  1765  1744  0 07:28 pts/6    00:00:00 [a.out <defunct>]
dbs900    1767     1  1765  1744  0 07:28 pts/6    00:00:00 ./a.out

So all these processes are cleaned up when a job is killed or exits.
But processes that simply call setsid() (and all their children) do
escape from a PBS job.

Two things:
   1. PBS loses the cputime used by daemonized processes when they
      exit because there is no parent in the job to inherit that
      usage.
   2. Covering the setsid() jobs requires using an alternative job
      identifier like PAGG (http://oss.sgi.com/projects/pagg/)
      or cpusets or just hacking Linux to add a "sid" that users
      cant change.

David

-- 
--------------------------------------------------------------------------
                                     ANU Supercomputer Facility
    David.Singleton at anu.edu.au       and APAC National Facility
    Phone: +61 2 6125 4389           Leonard Huxley Bldg (No. 56)
    Fax:   +61 2 6125 8199           Australian National University
                                     Canberra, ACT, 0200, Australia
--------------------------------------------------------------------------


More information about the torqueusers mailing list