[torqueusers] Signalling on multi node jobs.

Stewart.Samuels at sanofi-aventis.com Stewart.Samuels at sanofi-aventis.com
Tue Sep 20 13:22:13 MDT 2005


I vote for rolling in the option.

	Stewart

-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org]On Behalf Of Dave Jackson
Sent: Tuesday, September 20, 2005 3:00 PM
To: Garrick Staples
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] Signalling on multi node jobs.


Garrick,

  It is a single line change to kill the process group but there was
some discussion against it so this was shelved for the time being.  I
think one issue was if mom signal a process's children, it may prevent
the parent process from cleanly shutting them down using its own custom
method.

  Happy to roll it in or make it a configurable option.

Dave

On Tue, 2005-09-20 at 11:35 -0700, Garrick Staples wrote:
> On Mon, Sep 19, 2005 at 10:39:26PM +0200, Roy Dragseth alleged:
> > Hi.
> > 
> > On the mpiexec list we have been discussing how to get suspend/resume work 
> > with mpiexec.  I thought that if you send a signal using qsig or whatever it 
> > gets forwarded to all nodes in a job, but that does not seem to be the case.  
> > Only the mother superior receives the signal, is this the intended behaviour?
> 
> That is the expected behaviour currently.  Only MS signals processes.
> Historically only the "top level" process is signalled (the user's
> script).  Dave was talking about changing that to kill() the entire
> process group, but I'm not sure if that happened.
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list