[torqueusers] Removing processes after a job is killed

Craig Macdonald craigm at dcs.gla.ac.uk
Wed Jul 2 11:42:50 MDT 2008


Firstly, limit each node to one job per user. Then you can use a kill in 
the epilogue. See below for cutdown example

#!/bin/bash
jobid=$1
userid=$2

ps -U $userid -o pid --no-heading | xargs -r kill -KILL


C



David Sheen wrote:
> The parallel programming environments we use (e.g. MPICH) use SSH to
> create processes on the sister nodes.  If these jobs fail (are
> deleted, the mother node crashes, etc), the spawned processes remain
> on the sisters and eventually someone has to go and clean them out.
> Is there any way to use epilogue scripts to keep track of these
> processes and make sure they get killed properly if they need to be?
>
> David
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>   



More information about the torqueusers mailing list