[torqueusers] Signalling on multi node jobs.

Dave Jackson jacksond at clusterresources.com
Tue Sep 20 12:59:44 MDT 2005


Garrick,

  It is a single line change to kill the process group but there was
some discussion against it so this was shelved for the time being.  I
think one issue was if mom signal a process's children, it may prevent
the parent process from cleanly shutting them down using its own custom
method.

  Happy to roll it in or make it a configurable option.

Dave

On Tue, 2005-09-20 at 11:35 -0700, Garrick Staples wrote:
> On Mon, Sep 19, 2005 at 10:39:26PM +0200, Roy Dragseth alleged:
> > Hi.
> > 
> > On the mpiexec list we have been discussing how to get suspend/resume work 
> > with mpiexec.  I thought that if you send a signal using qsig or whatever it 
> > gets forwarded to all nodes in a job, but that does not seem to be the case.  
> > Only the mother superior receives the signal, is this the intended behaviour?
> 
> That is the expected behaviour currently.  Only MS signals processes.
> Historically only the "top level" process is signalled (the user's
> script).  Dave was talking about changing that to kill() the entire
> process group, but I'm not sure if that happened.
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list