[torqueusers] mpiexec not running on requested # of processors

Justin Finnerty justin.finnerty at uni-oldenburg.de
Thu Oct 9 09:14:51 MDT 2008


I think the problem is you are not really running your job under PBS. As
far as I know the mpiexec program most people use with MPICH knows about
torque but the MPICH2 mpiexec does not.

I am not sure if there is an easier way to do this but we use the
equivalent of the following (where MPI_PATH is set to MPICH2 dir).

-------- %< ---------
# Using bash shell!!
export PATH=${PATH}:${MPI_PATH}/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${MPI_PATH}/lib
trap mpdallexit SIGTERM
trap mpdallexit SIGKILL
if [ -e MPI_NODEFILE ]
then
rm -f MPI_NODEFILE
fi
for NODE in `cat ${PBS_NODEFILE} | sort | uniq`
do
  NCOUNT=`cat ${PBS_NODEFILE} | grep $NODE | wc -l`
  echo "$NODE:$NCOUNT" >> MPI_NODEFILE
done
# Assumes all machines have the same number of CPUs!!
mpdboot --rsh=/usr/bin/rsh --ncpus=${NCOUNT} --file=MPI_NODEFILE
--mpd=mpd
COUNT=`wc -l < ${PBS_NODEFILE}`
sleep 5
mpdcheck -f MPI_NODEFILE
mpiexec -machinefile MPI_NODEFILE -n ${COUNT} [your script]
rm -f MPI_NODEFILE
mpdallexit
-------- %< -----------

Cheers
	Justin

-- 
Dr Justin Finnerty
Rm W3-1-165         Ph 49 (441) 798 3726
Carl von Ossietzky Universität Oldenburg



More information about the torqueusers mailing list