[torqueusers] Removing processes after a job is killed

Prakash Velayutham prakash.velayutham at cchmc.org
Wed Jul 2 11:46:55 MDT 2008


Hi,

I am sure you must have heard of mpiexec for Torque-based Task  
Management. mpiexec (available from www.osc.edu/~pw/mpiexec/index.php)  
basically does the cleanup for you when you do qdel or something like  
that.

Prakash

On Jul 2, 2008, at 1:42 PM, Craig Macdonald wrote:

> Firstly, limit each node to one job per user. Then you can use a  
> kill in the epilogue. See below for cutdown example
>
> #!/bin/bash
> jobid=$1
> userid=$2
>
> ps -U $userid -o pid --no-heading | xargs -r kill -KILL
>
>
> C
>
>
>
> David Sheen wrote:
>> The parallel programming environments we use (e.g. MPICH) use SSH to
>> create processes on the sister nodes.  If these jobs fail (are
>> deleted, the mother node crashes, etc), the spawned processes remain
>> on the sisters and eventually someone has to go and clean them out.
>> Is there any way to use epilogue scripts to keep track of these
>> processes and make sure they get killed properly if they need to be?
>>
>> David
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

Prakash Velayutham
Programmer / Analyst
Cincinnati Children's Hospital Medical Center



More information about the torqueusers mailing list