[torqueusers] Processes are not killed when walltime is reached

Glen Beane glen.beane at gmail.com
Fri Feb 26 12:08:33 MST 2010


On Fri, Feb 26, 2010 at 11:43 AM, chris job.fr <chrisjob.fr at gmail.com>wrote:

> Hi,
>
> We use : Torque/PBS 2.1.6, - maui-3.2.6p21, mpich-1.2.7p1 on a cluster.
>
>  We use the mpirun command to submit job and we have sometimes the
> following problem :
>  When  the walltime is reached all the processes are not killed on
> the nodes. Someone has told me to write an epilog, but I don't know
> how to do it.
>


you don't need an epilog to solve this problem.   use OSC's mpiexec job
launcher to replace mpirun from (
http://www.osc.edu/~djohnson/mpiexec/index.php). Since this replacement job
launcher uses TORQUE's TM API instead of ssh to launch the remote jobs
TORQUE is aware of all processes that belong to the job and will properly
clean them up after a job hits its walltime.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100226/f0b53b40/attachment.html 


More information about the torqueusers mailing list