[torqueusers] Signalling on multi node jobs.

David Golden dgolden at cp.dias.ie
Thu Sep 22 06:52:29 MDT 2005


On 2005-09-20 12:38:55 -0700, Garrick Staples wrote:

> What if user processes could tell MOM what they want through the TM
> interface?  Maybe by default a suspend is sent to all process groups,
> but then (hypothetically) mpiexec could tell MOM, "please just tell me
> about a suspend request, I'll handle it myself."

A TM interface to do it might well be nice.
It would indeed be handy if signalling behaviour were easily per-job 
configurable - Presumably also with per-queue defaults per signal, and allowing 
or disallowing  the ability to override the default with a per-job setting 
being also configurable per queue...  How hard can it be? :-)

-> tracking a signal behaviours as attributes of a job and being able to 
select the propagation behaviour,say (for argument's sake) with e.g.
#PBS -W sighandling=SIGFOO:HEAD,SIGBAR:IGNORE,SIGBAZ:ALL

(where there's various options, maybe
HEAD: to the initial job script on the mother superior
HEADPGRP: to the process group on the mother superior
ALL: to all tasks spawned via TM associate with a job on any node.
ALLPGRP: to the processgroups of all tasks associated with the job.
the behaviour of the special suspend/resume could maybe also be
configurable)

A means to change the behaviour from the command line in the middle
of a job might also be nice.

I'm not too familiar with the torque source code, but on a glance through it 
seemed to me one could do it, I haven't tried yet though :-) - I imagine 
someone used to hacking torque would be a lot faster...



More information about the torqueusers mailing list