[torqueusers] Can i control if the jobs dies or not??

Leandro leotavaneiro at gmail.com
Wed Aug 10 05:23:24 MDT 2005


Hi,

I have an very clever aplication, who can dinamicaly distribute the load 
across the nodes allocated to run a job. If one node dies in the middle of 
the computation, the application can go on on the other nodes, and other 
process can get the unfinished process of the dead node to complete the 
process.

This application is writen in Fortran and we are using MPICH. The 
application dosen't have the need to comunicate, the processes dosen't share 
data, so the processes are very independent.

We use mpiexec to start the process in Torque, and i can remove the "-kill" 
parameter and the processes in the nodes will keep going, but the default 
behavior of PBS/Torque is kill the job when a node dies. Can i change this 
behavior? If there's no way to do tha with some kind of configuration, can 
someone point me in the code where i can work on this?

Thanks 

-- 
Leandro Tavares Carneiro
Analista de Suporte Linux/Unix
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20050810/b79f70d7/attachment.html


More information about the torqueusers mailing list