[torqueusers] Removing processes after a job is killed
craigm at dcs.gla.ac.uk
Wed Jul 2 11:42:50 MDT 2008
Firstly, limit each node to one job per user. Then you can use a kill in
the epilogue. See below for cutdown example
ps -U $userid -o pid --no-heading | xargs -r kill -KILL
David Sheen wrote:
> The parallel programming environments we use (e.g. MPICH) use SSH to
> create processes on the sister nodes. If these jobs fail (are
> deleted, the mother node crashes, etc), the spawned processes remain
> on the sisters and eventually someone has to go and clean them out.
> Is there any way to use epilogue scripts to keep track of these
> processes and make sure they get killed properly if they need to be?
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers