[torqueusers] Torque Ignores PIDS of MPI processes
Chris Samuel
csamuel at vpac.org
Mon Jul 23 18:49:06 MDT 2007
On Mon, 23 Jul 2007, Joshua Bernstein wrote:
> In my mind, a job is really a shell script that gets started by a
> pbs_mom. If I call mpirun from inside that job, which then forks new
> processes, TORQUE should be able to track which PIDs a particular job
> has spawned by looking at the PPID to PID relationship.
The problem is that a lot of MPI launchers use ssh/rsh to fire off jobs on
compute nodes, and at that point signals stop getting reliably propogated and
Torque cannot track them.
This is why mpiexec is a better solution because it use the PBS tm_spawn()
function to get the pbs_mom to launch the process rather than rsh/ssh and so
Torque has a pretty good idea of what's associated with the job.
cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070724/dc9aae86/attachment.bin
More information about the torqueusers
mailing list