[torqueusers] mpiexec problems under torque+maui

Jim Kusznir jkusznir at gmail.com
Fri Oct 5 09:42:15 MDT 2007


Hi all:

I'm having trouble getting torque+maui working with MPI.  I'm
actually seeing two different things; first: the default network
interface is infiniband (apparently), and its failing over to ethernet.
 To prevent users getting that warning, I'd like to configure it to
just use ethernet.  I added the btl = ^openib line to
openmpi-mca-params.conf to the head node, but haven't noticed any
difference.  I suspect this is an openmpi problem.

Second, I have some hard failures when an MPI_Send is called.  When
run without qsub, the mpi job runs fine, so its something that
qsub/torque/maui is doing (I think).  Here's the error:

libibverbs: Fatal: couldn't open sysfs class 'infiniband_verbs'.
--------------------------------------------------------------------------
[0,1,0]: OpenIB on host localhost was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
Signal:8 info.si_errno:0(Success) si_code:1(FPE_INTDIV)
Failing at addr:0x40cc2d
[0] func:/usr/lib64/openmpi/libopal.so.0 [0x3587221dc5]
[1] func:/lib64/tls/libpthread.so.0 [0x3587b0c4f0]
[2] func:repdig_mpi(sendSeeds+0x3d) [0x40cc2d]
[3] func:repdig_mpi(main+0x3b6) [0x40c026]
[4] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x358741c3fb]
[5] func:repdig_mpi [0x4030ea]
*** End of error message ***


I've looked through all the config files I found and didn't see
anything to address either.  In fact, for the second (hard error), I
don't even know what I'm looking for...

Thanks!
--Jim


More information about the torqueusers mailing list