[torqueusers] MPICH2 support in TORQUE: do I need to run mpdboot myself?

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Wed Jul 2 07:15:50 MDT 2008

  I have spent some hours to figure out whether I do have to run
myself mpdboot to start mpd daemons through system /etc/init.d/
script or not at the randomly chosen port (I do not understand
what you do if one of the machines has to be rebooted and upon
bootup does not know the randomly used port number). Although
the docs at http://www.clusterresources.com/torquedocs21/7.1mpi.shtml
mention something about MPICH2 it is just not enough.

  I have found some scripts used by people, notably:

  mpdboot --totalnum=`cat $PBS_NODEFILE | uniq | wc -l` -f $PBS_NODEFILE
  mpiexec -n `cat $PBS_NODEFILE | wc -l` a.out

but that gives me:

mpdroot: cannot connect to local mpd at: /tmp/mpd2.console_root
    probable cause:  no mpd daemon on this machine
    possible cause:  unix socket /tmp/mpd2.console_root has been removed
mpiexec_node004 (__init__ 1190): forked process failed; status=255

I also found:

#PBS -l nodes=8:ppn=8
#PBS -q workq 
#PBS -r n
#PBS -l walltime=00:35:00
set NNODES=8
set NCPUS=64
cat $PBS_NODEFILE > nodes
foreach NN (`cat $PBS_NODEFILE | uniq`)
 echo `echo $NN | cut -f1 -d.` >> machinefile.$PBS_JOBID
mpdboot -n $NNODES -f $PBS_O_WORKDIR/machinefile.$PBS_JOBID -v
mpiexec -n $NCPUS $PBS_O_WORKDIR/a.out 
unset NNODES
unset NCPUS


>From this Open MPI FAQ (http://www.open-mpi.org/faq/?category=tm) which Garrick
pointed out in the mailing list archives it is still unclear how it works
(how to configure the systems with Mpich2+Torque to get this behaviour). And,
nowhere was explained how to start paralle computation using the mpiexec bundled
in mpich2 and how does it differ to using the mpiched from

Would be nice if somebody could clarify how to configure and use what.

More information about the torqueusers mailing list