[torqueusers] Cleaning up stray processes from defunct jobs
Troy Baer
tbaer at utk.edu
Thu Sep 27 15:40:04 MDT 2012
On Thu, 2012-09-27 at 16:27 -0500, Dave Ulrick wrote:
> On occasion I see a user run an MPI job via TORQUE that doesn't shut down
> cleanly and as a result leaves running processes behind to interfere with
> subsequent jobs that are assigned to its nodes. Any suggestions on how I
> might go about simplifying the task of finding and killing these
> processes?
I would recommend running something like reaver [1] in your
epilogue.parallel on each node.
[1] http://svn.nics.tennessee.edu/repos/pbstools/trunk/sbin/reaver
--Troy
--
Troy Baer, Senior HPC System Administrator
National Institute for Computational Sciences, University of Tennessee
http://www.nics.tennessee.edu/
Phone: 865-241-4233
More information about the torqueusers
mailing list