[torqueusers] Removing processes after a job is killed

David Sheen sheen at usc.edu
Wed Jul 2 11:39:23 MDT 2008


The parallel programming environments we use (e.g. MPICH) use SSH to
create processes on the sister nodes.  If these jobs fail (are
deleted, the mother node crashes, etc), the spawned processes remain
on the sisters and eventually someone has to go and clean them out.
Is there any way to use epilogue scripts to keep track of these
processes and make sure they get killed properly if they need to be?

David


More information about the torqueusers mailing list