[torqueusers] Epilogue script
ckirby3 at colsa.com
Tue Aug 22 08:42:38 MDT 2006
I currently use an epilogue script to kill all the PIDs of the user but that
is not the best solution. Tracking down the child processes of an mpirun
parallel job is not an easy task because each cluster system participating
in the parallel job creates unique PID's for the job.
I hope your question is answered because I am want the same thing you do.
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Eugene van den
Sent: Tuesday, August 22, 2006 4:37 AM
To: torqueusers at supercluster.org
Subject: [torqueusers] Epilogue script
I am looking at implementing torque on our cluster.
I have been looking at using an epilogue script to clean up after
jobs, particularly if the job is aborted or deleted.
This seems to be particularly needed in the case when running jobs
using mpich and mpirun.
I have looked at using mpiexec instead of mpirun. I installed mpiexec
and it seems to work fine.
Can anyone think of any reason why using mpiexec instead of mpirun is
a bad idea?
If I use mpiexec instead of mpirun would I be right in thinking that
it still a good idea to use epilogue
scripts for other types of jobs.
Each node is dual processor so I do not want to kill processes based
on username, as a user may have more than one job on a node.
So it looks like I would have to use a script that would be able to
kill orphaned processes based on job id.
Would anyone have any suggestions as to how I could do this or sample
scripts that I could try?
Any help would be greatly appreciated.
torqueusers mailing list
torqueusers at supercluster.org
More information about the torqueusers