[torqueusers] Signalling on multi node jobs.
dgolden at cp.dias.ie
Thu Sep 22 06:52:29 MDT 2005
On 2005-09-20 12:38:55 -0700, Garrick Staples wrote:
> What if user processes could tell MOM what they want through the TM
> interface? Maybe by default a suspend is sent to all process groups,
> but then (hypothetically) mpiexec could tell MOM, "please just tell me
> about a suspend request, I'll handle it myself."
A TM interface to do it might well be nice.
It would indeed be handy if signalling behaviour were easily per-job
configurable - Presumably also with per-queue defaults per signal, and allowing
or disallowing the ability to override the default with a per-job setting
being also configurable per queue... How hard can it be? :-)
-> tracking a signal behaviours as attributes of a job and being able to
select the propagation behaviour,say (for argument's sake) with e.g.
#PBS -W sighandling=SIGFOO:HEAD,SIGBAR:IGNORE,SIGBAZ:ALL
(where there's various options, maybe
HEAD: to the initial job script on the mother superior
HEADPGRP: to the process group on the mother superior
ALL: to all tasks spawned via TM associate with a job on any node.
ALLPGRP: to the processgroups of all tasks associated with the job.
the behaviour of the special suspend/resume could maybe also be
A means to change the behaviour from the command line in the middle
of a job might also be nice.
I'm not too familiar with the torque source code, but on a glance through it
seemed to me one could do it, I haven't tried yet though :-) - I imagine
someone used to hacking torque would be a lot faster...
More information about the torqueusers