[torqueusers] Signalling on multi node jobs.
Dave Jackson
jacksond at clusterresources.com
Tue Sep 20 12:59:44 MDT 2005
Garrick,
It is a single line change to kill the process group but there was
some discussion against it so this was shelved for the time being. I
think one issue was if mom signal a process's children, it may prevent
the parent process from cleanly shutting them down using its own custom
method.
Happy to roll it in or make it a configurable option.
Dave
On Tue, 2005-09-20 at 11:35 -0700, Garrick Staples wrote:
> On Mon, Sep 19, 2005 at 10:39:26PM +0200, Roy Dragseth alleged:
> > Hi.
> >
> > On the mpiexec list we have been discussing how to get suspend/resume work
> > with mpiexec. I thought that if you send a signal using qsig or whatever it
> > gets forwarded to all nodes in a job, but that does not seem to be the case.
> > Only the mother superior receives the signal, is this the intended behaviour?
>
> That is the expected behaviour currently. Only MS signals processes.
> Historically only the "top level" process is signalled (the user's
> script). Dave was talking about changing that to kill() the entire
> process group, but I'm not sure if that happened.
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list