[torqueusers] mpiexec jobs got stuck

Troy Baer tbaer at utk.edu
Tue May 12 13:40:35 MDT 2009

On Tue, 2009-05-12 at 15:19 -0400, Abhishek Gupta wrote:
> Its MPICH2.

If you're using the mpiexec included with MPICH2, it's possible that you
are running out of privileged ports for the rsh connections to the other
nodes.  Try using OSC's mpiexec replacement [1] (which uses TORQUE's TM
API to start up the MPI processes), and see if that makes a difference.

[1] http://www.osc.edu/~pw/mpiexec/index.php

Troy Baer, HPC System Administrator
National Institute for Computational Sciences, University of Tennessee
Phone:  865-241-4233

More information about the torqueusers mailing list