[torqueusers] Epilogue script

Troy Baer troy at osc.edu
Tue Aug 22 08:40:42 MDT 2006


On Tue, 2006-08-22 at 10:37 +0100, Eugene van den Hurk wrote:
> I am looking at implementing torque on our cluster.
> I have been looking at using an epilogue script to clean up after 
> jobs, particularly if the job is aborted or deleted.
> This seems to be particularly needed in the case when running jobs 
> using mpich and mpirun.
> I have looked at using mpiexec instead of mpirun. I installed mpiexec 
> and it seems to work fine.
> Can anyone think of any reason why using mpiexec instead of mpirun is 
> a bad idea?

None that I can think of, but then again I'm biased. :)

The things we've run into at OSC that sometimes can make mpiexec
problematic are ISV codes that are compiled statically against MPICH/p4
and insist on invoking mpirun under the covers.  The parallel version of
Turbomole is the one with which I remember having the most trouble, but
I think the parallel version of Abaqus works like this as well.  Pete W.
developed a mpirun "replacement" for that took all the well-known MPICH
mpirun arguments, ignored most of them, and invoked mpiexec.  However, I
don't know if he's made that part of the mpiexec distribution or not.

> If I use mpiexec instead of mpirun would I be right in thinking that 
> it still a good idea to use epilogue
> scripts for other types of jobs.
> Each node is dual processor so I do not want to kill processes based 
> on username, as a user may have more than one job on a node.
> So it looks like I would have to use a script that would be able to 
> kill orphaned processes based on job id.
> Would anyone have any suggestions as to how I could do this or sample 
> scripts that I could try?

Take a look at the reaver script in the pbstools SVN head:

http://svn.osc.edu/repos/pbstools/trunk/sbin/reaver

It's designed to identify (and optionally kill) processes which are
*not* owned by either system userids or userids with jobs assigned to
the node, but you can give it a list of jobids to clean up as well.

I haven't done a release of pbstools in a while...  [makes mental note
to do one soon]

	--Troy
-- 
Troy Baer                       troy at osc.edu
Science & Technology Support    http://www.osc.edu/hpc/
Ohio Supercomputer Center       614-292-9701



More information about the torqueusers mailing list