[torqueusers] Epilogue script

Cliff Kirby ckirby3 at colsa.com
Tue Aug 22 08:42:38 MDT 2006


I currently use an epilogue script to kill all the PIDs of the user but that
is not the best solution.  Tracking down the child processes of an mpirun
parallel job is not an easy task because each cluster system participating
in the parallel job creates unique PID's for the job.
I hope your question is answered because I am want the same thing you do.

- Cliff

-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Eugene van den
Hurk
Sent: Tuesday, August 22, 2006 4:37 AM
To: torqueusers at supercluster.org
Subject: [torqueusers] Epilogue script


Hello,

I am looking at implementing torque on our cluster.
I have been looking at using an epilogue script to clean up after 
jobs, particularly if the job is aborted or deleted.
This seems to be particularly needed in the case when running jobs 
using mpich and mpirun.
I have looked at using mpiexec instead of mpirun. I installed mpiexec 
and it seems to work fine.
Can anyone think of any reason why using mpiexec instead of mpirun is 
a bad idea?
If I use mpiexec instead of mpirun would I be right in thinking that 
it still a good idea to use epilogue
scripts for other types of jobs.
Each node is dual processor so I do not want to kill processes based 
on username, as a user may have more than one job on a node.
So it looks like I would have to use a script that would be able to 
kill orphaned processes based on job id.
Would anyone have any suggestions as to how I could do this or sample 
scripts that I could try?
Any help would be greatly appreciated.

Thanks,
Regards,
Eugene.

_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list