[torqueusers] IB epilogoue/prolog, and any other concerns

Chris Samuel csamuel at vpac.org
Fri May 30 05:37:19 MDT 2008


----- "Walid" <walid.shaari at gmail.com> wrote:

> Hi Chris,

Hiya,

> 2008/5/30 Chris Samuel <csamuel at vpac.org>:
> 
> > We started off with MVAPICH but moved to OpenMPI when we
> > found we had real problems getting any job larger than 64
> > CPUs to start with it.
> 
> I am taking much larger number than  that in terms of cores, and
> wondering what would be the overhead on connections, and as a
> result on system resources associated with this connection, and
> that most likely means some parameters that i need to tune, and
> have in place.

I don't know, but having hit that bug we decided it was easier
to go to OpenMPI which just worked for us.

> > OpenMPI also has *much* better error messages, and doesn't
> > have the dumb idea of enabling CPU affinity by default on
> > AMD64 systems (though that might be fixed by now).
> 
> you mean OpenMPI does handle CPU affinity by default or that is
> something I should be worried about?

No, I mean that OpenMPI doesn't try and do CPU affinity
by default, whereas MVAPICH was doing it (and doing it wrong).

> in AMD64 most likely i should worry, however we are
> using Intel Harpertown E5450, unless the developer
> submits a 2 core job in an 8 core node, should i worry
> how to make sure that each core is a differenty cpu at
> least?

The Torque cpusets code would help (if you patch it, at least
until the latest patches his the 2.3-fixes branch in SVN
and a new snapshot is rolled), and as long as you change
the code to use the jobset rather than the per-vnode set.

> > Their code naively binds from cores 0->N, which is fine
> > until you run two 4 CPU codes on an 8 core node and why
> > they're running at half speed compared to just running
> > one job on its own.. :-(
> 
> Interesting!

That's one way of putting it!  Basically you end up
with 2 x 4 core jobs fighting over the first 4 cores
of the system, whilst the other 4 cores sit around
twiddling their thumbs wondering why the rest are so
busy..

cheers!
Chris

-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


More information about the torqueusers mailing list