[torqueusers] IB epilogoue/prolog, and any other concerns
Chris Samuel
csamuel at vpac.org
Fri May 30 05:37:19 MDT 2008
----- "Walid" <walid.shaari at gmail.com> wrote:
> Hi Chris,
Hiya,
> 2008/5/30 Chris Samuel <csamuel at vpac.org>:
>
> > We started off with MVAPICH but moved to OpenMPI when we
> > found we had real problems getting any job larger than 64
> > CPUs to start with it.
>
> I am taking much larger number than that in terms of cores, and
> wondering what would be the overhead on connections, and as a
> result on system resources associated with this connection, and
> that most likely means some parameters that i need to tune, and
> have in place.
I don't know, but having hit that bug we decided it was easier
to go to OpenMPI which just worked for us.
> > OpenMPI also has *much* better error messages, and doesn't
> > have the dumb idea of enabling CPU affinity by default on
> > AMD64 systems (though that might be fixed by now).
>
> you mean OpenMPI does handle CPU affinity by default or that is
> something I should be worried about?
No, I mean that OpenMPI doesn't try and do CPU affinity
by default, whereas MVAPICH was doing it (and doing it wrong).
> in AMD64 most likely i should worry, however we are
> using Intel Harpertown E5450, unless the developer
> submits a 2 core job in an 8 core node, should i worry
> how to make sure that each core is a differenty cpu at
> least?
The Torque cpusets code would help (if you patch it, at least
until the latest patches his the 2.3-fixes branch in SVN
and a new snapshot is rolled), and as long as you change
the code to use the jobset rather than the per-vnode set.
> > Their code naively binds from cores 0->N, which is fine
> > until you run two 4 CPU codes on an 8 core node and why
> > they're running at half speed compared to just running
> > one job on its own.. :-(
>
> Interesting!
That's one way of putting it! Basically you end up
with 2 x 4 core jobs fighting over the first 4 cores
of the system, whilst the other 4 cores sit around
twiddling their thumbs wondering why the rest are so
busy..
cheers!
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
More information about the torqueusers
mailing list