[torqueusers] Removing processes after a job is killed
sheen at usc.edu
Wed Jul 2 11:39:23 MDT 2008
The parallel programming environments we use (e.g. MPICH) use SSH to
create processes on the sister nodes. If these jobs fail (are
deleted, the mother node crashes, etc), the spawned processes remain
on the sisters and eventually someone has to go and clean them out.
Is there any way to use epilogue scripts to keep track of these
processes and make sure they get killed properly if they need to be?
More information about the torqueusers