[torqueusers] Handling of double-fork-and-kill detached processes

Ian Stokes-Rees i.stokes-rees1 at physics.ox.ac.uk
Wed Mar 2 08:57:49 MST 2005


[I just sent this to the GridEngine mailing list, but am also interested 
in how Torque (and PBS, if anyone knows) handles this situation]

Hi,

How does Torque deal with processes which detach from their parent 
process via the common "double fork and kill" technique?  I'm just 
wondering if it is possible for users to start a process which then 
sticks around even when the original process group has been killed.  We 
seem to be having this problem with a current cluster and are wondering 
if Torque does anything "auto-magically" to catch these processes and 
kill them.

Our first idea was to kill all processes by the particular user on that 
node once their job finished, but then we realised that it might be a 
dual (or quad) CPU node, or there may be process overloading, so the 
same user may have more than one legitimate job running at the same time 
within the same process space, so killing everything by them would be a 
no-no -- the other job would be killed too.

Thanks for suggestions regarding how this is handled in Torque.

Cheers,

Ian.
-- 
Ian Stokes-Rees                 i.stokes-rees at physics.ox.ac.uk
Particle Physics, Oxford        http://www-pnp.physics.ox.ac.uk/~stokes


More information about the torqueusers mailing list