[Mauiusers] MAUI + MPICH2 question (init script + scheduling on the wrong node)

Garrick Staples garrick at usc.edu
Wed Jul 6 09:46:45 MDT 2005


On Wed, Jul 06, 2005 at 10:52:05AM +0200, Alexandre Le Bouthillier alleged:
> 
> Dear members,
> 
> We have a cluster with torque, maui and mpich2 on suse 9.2
> 
> Quick question, when a qsub is made for a mpiexec of n thread and n nodes are 
> reserved, how mpi knows on which node to send it.
> 
> Here is the setup
> #at cluster startup
> mpdboot -n 12 -F mpd.hosts
> 
> qsub script
> 
> the script contain a request for 4 nodes and a mpiexec with -n 4 for example.
> 
> After doing so quick test I find that mpi doesn't run the jobs on the reserved 
> nodes.  
> 1) How do you link maui and mpi together.
> 
> 2) The problem with mpdboot, which is run on the master node only, is if a 
> node restart, it doesn't join the mpdring again.  Someone have an init script 
> for suse  with mpd ?

What you want to do is run mpdboot and mpdshutdown inside of your job script using
$PBS_NODEFILE as the host list.

Or consider using the replacement mpiexec (it actually predates the 'mpiexec'
command from mpich2) from http://www.osc.edu/~pw/mpiexec because it integrates
tightly with the pbs_mom daemons.  This replaces mpdboot, mpdshutdown, mpirun,
and mpich2's mpiexec.

This subject is lightly touched on here:
http://www.clusterresources.com/products/torque/docs/3.2mpi.shtml

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20050706/8b91dada/attachment.bin


More information about the mauiusers mailing list