[torqueusers] MPI job submitted with TORQUE does not use InfiniBand if running and start nodes overlap [2]

Guilherme Menegon Arantes garantes at iq.usp.br
Wed Jun 19 11:55:50 MDT 2013


Hi there,

I am using Intel MPI (4.1.0.024 from ICS 2013.0.028) to run my parallel
application (Gromacs 4.6.1 molecular dynamics) on a SGI cluster with
CentOS 6.2 and Torque 2.5.12.

When I submitt a MPI job with Torque to start and run on 2 nodes, MPI 
startup fails to negotiate with Infiniband (IB) and internode 
communication falls back to Ethernet. This is my job script:

#PBS -l nodes=n001:ppn=32+n002:ppn=32
#PBS -q normal
source /opt/intel/impi/4.1.0.024/bin64/mpivars.sh
source /opt/progs/gromacs/bin/GMXRC.bash
cd $PBS_O_WORKDIR/
export I_MPI_DEBUG=2
mpiexec.hydra -machinefile macs -np 64 mdrun_mpi >& md.out

Of course the machinefile macs may be obtained from $PBS_NODEFILE, but
was fixed in this example. The output is:

[54] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
...
[45] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
...
[33] MPI startup(): DAPL provider <NULLstring> on rank 0:n001 differs from ofa-v2-mlx4_0-1(v2.0) on rank 33:n002
...
[0] MPI startup(): shm and tcp data transfer modes
...

However, MPI negotiates fine with IB if I run the same mpiexec.hydra
line from the console either logged to n001 (one of the running nodes)
or logged in another, say the admin, node. It also works fine if I
submitt the TORQUE job using a different start node than the running
nodes (-machinefile macs points to n001 and n002), say using #PBS -l 
nodes=n003 and the rest identical to as above. This a succesfull  
output:

[55] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
...
[29] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
...
[0] MPI startup(): shm and dapl data transfer modes
...

Any tips on what is going wrong? Pls, let me know if you need more info.
This has also been posted to the Intel MPI forum, but your help is 
appreciated too.

Cheers,

Guilherme

--

Prof. Dr. Guilherme Menegon Arantes

Instituto de Química
Universidade de São Paulo
Av. Prof. Lineu Prestes, 748
São Paulo          05508-000
Brasil
Fone: 55-11-30913848
http://gaznevada.iq.usp.br/
___________________________________


----- End forwarded message -----

--

Prof. Dr. Guilherme Menegon Arantes

Instituto de Química
Universidade de São Paulo
Av. Prof. Lineu Prestes, 748
São Paulo          05508-000
Brasil
Fone: 55-11-30913848
http://gaznevada.iq.usp.br/
___________________________________



More information about the torqueusers mailing list