[torqueusers] Cleaning up stray processes from defunct jobs

Burkhard Bunk bunk at physik.hu-berlin.de
Thu Sep 27 15:42:52 MDT 2012


Hi,

you should mention which implementation of MPI you are using.

"Dirty termination" of parallel runs was normal with mpich1, and there
was much discussion about possible ways (scripts) for cleanup.
With OpenMPI, however, this problems seems to be gone.

Regards,
Burkhard Bunk.
----------------------------------------------------------------------
  bunk at physik.hu-berlin.de      Physics Institute, Humboldt University
  fax:    ++49-30 2093 7628     Newtonstr. 15
  phone:  ++49-30 2093 7980     12489 Berlin, Germany
----------------------------------------------------------------------

On Thu, 27 Sep 2012, Dave Ulrick wrote:

> On occasion I see a user run an MPI job via TORQUE that doesn't shut down
> cleanly and as a result leaves running processes behind to interfere with
> subsequent jobs that are assigned to its nodes. Any suggestions on how I
> might go about simplifying the task of finding and killing these
> processes?
>
> Thanks,
> Dave
> -- 
> Dave Ulrick
> d-ulrick at comcast.net
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>


More information about the torqueusers mailing list