[torqueusers] jobs completing with processes still running - SOLVED

Chris Samuel csamuel at vpac.org
Thu May 8 21:08:34 MDT 2008


----- "Lloyd Brown" <lloyd_brown at byu.edu> wrote:

> If you do decide to use OpenMPI, though, be sure that the version you
> install has the TM interface enabled.

Also be aware that you don't want to use the current
cpusets implementation in Torque 2.3.0 with OpenMPI
as it stands.

This is because the current implementation assumes
that there will a tm_spawn() for each MPI child on
each node and can put each into a single CPU cpuset
based on the vnode.

Unfortunately OpenMPI (and probably LAM too) makes
only one tm_spawn per *node* it will be using to
start the daemon (orted) and then that spawns the
MPI processes on the node.

So with the current implementation you end up with
all the MPI processes on a node sharing a single CPU,
which is a Bad Thing(tm).

cheers!
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


More information about the torqueusers mailing list