[torqueusers] mpiexec not running on requested # of processors
Justin Finnerty
justin.finnerty at uni-oldenburg.de
Thu Oct 9 09:14:51 MDT 2008
I think the problem is you are not really running your job under PBS. As
far as I know the mpiexec program most people use with MPICH knows about
torque but the MPICH2 mpiexec does not.
I am not sure if there is an easier way to do this but we use the
equivalent of the following (where MPI_PATH is set to MPICH2 dir).
-------- %< ---------
# Using bash shell!!
export PATH=${PATH}:${MPI_PATH}/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${MPI_PATH}/lib
trap mpdallexit SIGTERM
trap mpdallexit SIGKILL
if [ -e MPI_NODEFILE ]
then
rm -f MPI_NODEFILE
fi
for NODE in `cat ${PBS_NODEFILE} | sort | uniq`
do
NCOUNT=`cat ${PBS_NODEFILE} | grep $NODE | wc -l`
echo "$NODE:$NCOUNT" >> MPI_NODEFILE
done
# Assumes all machines have the same number of CPUs!!
mpdboot --rsh=/usr/bin/rsh --ncpus=${NCOUNT} --file=MPI_NODEFILE
--mpd=mpd
COUNT=`wc -l < ${PBS_NODEFILE}`
sleep 5
mpdcheck -f MPI_NODEFILE
mpiexec -machinefile MPI_NODEFILE -n ${COUNT} [your script]
rm -f MPI_NODEFILE
mpdallexit
-------- %< -----------
Cheers
Justin
--
Dr Justin Finnerty
Rm W3-1-165 Ph 49 (441) 798 3726
Carl von Ossietzky Universität Oldenburg
More information about the torqueusers
mailing list