[torqueusers] Torque Ignores PIDS of MPI processes

Chris Samuel csamuel at vpac.org
Mon Jul 23 18:49:06 MDT 2007


On Mon, 23 Jul 2007, Joshua Bernstein wrote:

> In my mind, a job is really a shell script that gets started by a
> pbs_mom. If I call mpirun from inside that job, which then forks new
> processes, TORQUE should be able to track which PIDs a particular job
> has spawned by looking at the PPID to PID relationship.

The problem is that a lot of MPI launchers use ssh/rsh to fire off jobs on 
compute nodes, and at that point signals stop getting reliably propogated and 
Torque cannot track them.

This is why mpiexec is a better solution because it use the PBS tm_spawn() 
function to get the pbs_mom to launch the process rather than rsh/ssh and so 
Torque has a pretty good idea of what's associated with the job.

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070724/dc9aae86/attachment.bin


More information about the torqueusers mailing list