[torqueusers] MPICH2 support in TORQUE: do I need to run mpdboot myself?

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Wed Jul 2 07:15:50 MDT 2008


Hi,
  I have spent some hours to figure out whether I do have to run
myself mpdboot to start mpd daemons through system /etc/init.d/
script or not at the randomly chosen port (I do not understand
what you do if one of the machines has to be rebooted and upon
bootup does not know the randomly used port number). Although
the docs at http://www.clusterresources.com/torquedocs21/7.1mpi.shtml
mention something about MPICH2 it is just not enough.

  I have found some scripts used by people, notably:
http://www.cct.lsu.edu/~hsunda3/doc/#id241705

  mpdboot --totalnum=`cat $PBS_NODEFILE | uniq | wc -l` -f $PBS_NODEFILE
  mpiexec -n `cat $PBS_NODEFILE | wc -l` a.out
  mpdallexit

but that gives me:

mpdroot: cannot connect to local mpd at: /tmp/mpd2.console_root
    probable cause:  no mpd daemon on this machine
    possible cause:  unix socket /tmp/mpd2.console_root has been removed
mpiexec_node004 (__init__ 1190): forked process failed; status=255


I also found:
http://www.cita.utoronto.ca/mediawiki/index.php/Sunnyvale#Submitting_Jobs

#!/bin/csh 
#PBS -l nodes=8:ppn=8
#PBS -q workq 
#PBS -r n
#PBS -l walltime=00:35:00
set NNODES=8
set NCPUS=64
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE > nodes
foreach NN (`cat $PBS_NODEFILE | uniq`)
 echo `echo $NN | cut -f1 -d.` >> machinefile.$PBS_JOBID
end
mpdboot -n $NNODES -f $PBS_O_WORKDIR/machinefile.$PBS_JOBID -v
mpiexec -n $NCPUS $PBS_O_WORKDIR/a.out 
mpdallexit
unset NNODES
unset NCPUS


https://www.liniac.upenn.edu/wiki/tiki-index.php?page=LAM+with+Torque
http://wwwas.oat.ts.astro.it/planck/index.php?option=com_content&task=view&id=30&Itemid=46

>From this Open MPI FAQ (http://www.open-mpi.org/faq/?category=tm) which Garrick
pointed out in the mailing list archives it is still unclear how it works
(how to configure the systems with Mpich2+Torque to get this behaviour). And,
nowhere was explained how to start paralle computation using the mpiexec bundled
in mpich2 and how does it differ to using the mpiched from
http://www.osc.edu/~pw/mpiexec


Would be nice if somebody could clarify how to configure and use what.
Thanks,
Martin


More information about the torqueusers mailing list