[torqueusers] Re: kill_delay

Pete Wyckoff pw at osc.edu
Tue Feb 27 13:23:04 MST 2007


Roy.Dragseth at cc.uit.no wrote on Tue, 27 Feb 2007 10:50 +0100:
> After some tinkering with the code I've come to the conclusion that the kill 
> loop makes a lot of sense for parallel jobs, as you want to give an mpi 
> launcher the time to clean up before it is killed with an untrappable signal.
> The loop is only executed on a SIGKILL.  The annoying delay should be fixed 
> by doing a fork.

Apologies in advance if I'm not paying enough attention to Torque
development lately.  The kill loop in mom appears to be the source
of a regression in Torque that affects mpiexec users:

http://www.supercluster.org/pipermail/torqueusers/2006-November/004714.html

Do things work properly now so that a parallel job launcher gets
the obit signals and can clean up?  If so, I'll be happy to remove
that issue from the list.  Thanks,

		-- Pete


More information about the torqueusers mailing list