[torqueusers] MPI Problem
dbeer at adaptivecomputing.com
Wed Oct 13 13:06:41 MDT 2010
This is on behalf of a user. I don't know if any of you have experience with this, but I thought this would be a good place to ask. If anyone can help, please copy Bermudez.Luis at orbital.com on your reply. Many thanks. Here is his email:
I have two applications that are giving me issues when I try to run them
using more than one computing node. They both run fine outside of Torque
for any number of nodes and inside Torque if only one node is requested.
These two applications have been compiled using the Intel FORTRAN compiler.
We have other tools compiled with the same compiler which run fine in any
number of nodes. Currently I am using OpenMPI 1.4.2 and Torque 2.4.3.
I have tried to exclude the infiniband (mpirun --mca btl ^openib...) and
rely only on the high speed Ethernet with identical results. I have also
opened the file descriptor limit to 28768 and added explicit ulimit calls
for the file descriptor and size memory locked in the torque startup
In all instances, the message I am getting when the run crashes is:
mpirun noticed that process rank 4 with PID 17894 on node <node hostname>
exited on signal 11 (Segmentation fault).
Any suggestion to explain and help fixing this issue will be very much
Direct Line: 801-717-3386 | Fax: 801-717-3738
1656 S. East Bay Blvd. Suite #300
Provo, UT 84606
More information about the torqueusers