[torqueusers] MPI Problem

Brock Palen brockp at umich.edu
Wed Oct 13 13:45:05 MDT 2010

He mentioned he already did ulimit, but was it on the stack size?  Many old fortran codes did not use ALLOCATE, also while you can set the stacksize to unlimited in /etc/securitty/limits.conf

It does not apply until after the pbs_mom init script starts, so it needs to be in the pbs_mom init script before the mom starts.

Have him in a pbs script run:

pbsdsh bash -c 'ulimit -s'

and send the output, 

Brock Palen
Center for Advanced Computing
brockp at umich.edu

On Oct 13, 2010, at 3:06 PM, David Beer wrote:

> Hi all, 
> This is on behalf of a user. I don't know if any of you have experience with this, but I thought this would be a good place to ask. If anyone can help, please copy Bermudez.Luis at orbital.com on your reply. Many thanks. Here is his email:
> I have two applications that are giving me issues when I try to run them
> using more than one computing node.  They both run fine outside of Torque
> for any number of nodes and inside Torque if only one node is requested.
> These two applications have been compiled using the Intel FORTRAN compiler.
> We have other tools compiled with the same compiler which run fine in any
> number of nodes.  Currently I am using OpenMPI 1.4.2 and Torque 2.4.3.
> I have tried to exclude the infiniband (mpirun --mca btl ^openib...) and
> rely only on the high speed Ethernet with identical results.  I have also
> opened the file descriptor limit to 28768 and added explicit ulimit calls
> for the file descriptor and size memory locked in the torque startup
> script.
> In all instances, the message I am getting when the run crashes is:
> mpirun noticed that process rank 4 with PID 17894 on node <node hostname>
> exited on signal 11 (Segmentation fault).
> Any suggestion to explain and help fixing this issue will be very much
> appreciated.
> -- 
> David Beer 
> Direct Line: 801-717-3386 | Fax: 801-717-3738
>     Adaptive Computing
>     1656 S. East Bay Blvd. Suite #300
>     Provo, UT 84606
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

More information about the torqueusers mailing list