[torqueusers] Cleaning up stray processes from defunct jobs

Troy Baer tbaer at utk.edu
Thu Sep 27 15:40:04 MDT 2012


On Thu, 2012-09-27 at 16:27 -0500, Dave Ulrick wrote:
> On occasion I see a user run an MPI job via TORQUE that doesn't shut down 
> cleanly and as a result leaves running processes behind to interfere with 
> subsequent jobs that are assigned to its nodes. Any suggestions on how I 
> might go about simplifying the task of finding and killing these 
> processes?

I would recommend running something like reaver [1] in your
epilogue.parallel on each node.

[1] http://svn.nics.tennessee.edu/repos/pbstools/trunk/sbin/reaver

	--Troy
-- 
Troy Baer, Senior HPC System Administrator
National Institute for Computational Sciences, University of Tennessee
http://www.nics.tennessee.edu/
Phone:  865-241-4233




More information about the torqueusers mailing list