[torqueusers] Can i control if the jobs dies or not??
Leandro
leotavaneiro at gmail.com
Wed Aug 10 05:23:24 MDT 2005
Hi,
I have an very clever aplication, who can dinamicaly distribute the load
across the nodes allocated to run a job. If one node dies in the middle of
the computation, the application can go on on the other nodes, and other
process can get the unfinished process of the dead node to complete the
process.
This application is writen in Fortran and we are using MPICH. The
application dosen't have the need to comunicate, the processes dosen't share
data, so the processes are very independent.
We use mpiexec to start the process in Torque, and i can remove the "-kill"
parameter and the processes in the nodes will keep going, but the default
behavior of PBS/Torque is kill the job when a node dies. Can i change this
behavior? If there's no way to do tha with some kind of configuration, can
someone point me in the code where i can work on this?
Thanks
--
Leandro Tavares Carneiro
Analista de Suporte Linux/Unix
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20050810/b79f70d7/attachment.html
More information about the torqueusers
mailing list