[torqueusers] how can use infiniband nodes in torque
jbernstein at penguincomputing.com
Wed Sep 10 12:27:06 MDT 2008
If you are using MVAPICH, (and not MPICH), Infiniband should really give
your application some performance advantages. Since though you aren't
seeing any performance changes, one of two things is likely going on:
1) You think you are running over IB, when actually you are not. Make
sure that you have your PATH and LD_LIBRARY_PATH set correctly to point
to the MVAPICH binaries and libraries rather then, say, the MPICH ones.
You may even try running some simple programs outside of TORQUE to
verify that IB and MVAPICH are actually working. I suggest the
osu_latency.c test. When the test is running over Ethernet, Latencies
should likely be in the 10's of microseconds. Likely something close to
50 to 60. When the e test runs over r IB, the times sound be in the
single digits, say something like 5 to 6 microseconds.
2) If the runtime are close but not actually identical, there is a
chance your code simply doesn't take advantage of IB, and likely does
not do very much inter-node communication. Usually these codes are bound
by disk I/O or memory access rather then communication latencies.
Generally CFD style codes tend to scale well and will perform better
over IB, FEA, or other type of structual codes may not scale as well,
and may be more disk/memory I/O bound rather then network bound.
Also, while its true that OpenMPI generally performs better then
MVAPICH, before going through the hassle of switching, you should first
see if you are actually running over IB, before jumping to conclusions.
Hope that helps.
zhyang at lzu.edu.cn wrote:
> I have a cluster,including 14 infinband nodes ,I want to know whether effect in infiniband more fast than 1000M ethernet , I install mvaich,but I use torque submit job (use mpich2 and mavich),but I found the job running time is no diffence.I saw the document about mpich2,it said it support the infiniband. anybody used infiniband nodes? how can I using the infinband nodes? Need I try compiler the torque or mpich2¡¢mvaich(add some parametere)?
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers