[torqueusers] Cleaning up stray processes from defunct jobs

David Singleton David.Singleton at anu.edu.au
Thu Sep 27 16:04:15 MDT 2012


On 09/28/2012 07:27 AM, Dave Ulrick wrote:
> On occasion I see a user run an MPI job via TORQUE that doesn't shut down
> cleanly and as a result leaves running processes behind to interfere with
> subsequent jobs that are assigned to its nodes. Any suggestions on how I
> might go about simplifying the task of finding and killing these
> processes?
>

Only support MPIs that use the tm API.  You'll have to block ssh between
nodes to enforce this.

Cheers
David



More information about the torqueusers mailing list