[Mauiusers] MAUI + MPICH2 question (init script + scheduling on
the wrong node)
Garrick Staples
garrick at usc.edu
Wed Jul 6 09:46:45 MDT 2005
On Wed, Jul 06, 2005 at 10:52:05AM +0200, Alexandre Le Bouthillier alleged:
>
> Dear members,
>
> We have a cluster with torque, maui and mpich2 on suse 9.2
>
> Quick question, when a qsub is made for a mpiexec of n thread and n nodes are
> reserved, how mpi knows on which node to send it.
>
> Here is the setup
> #at cluster startup
> mpdboot -n 12 -F mpd.hosts
>
> qsub script
>
> the script contain a request for 4 nodes and a mpiexec with -n 4 for example.
>
> After doing so quick test I find that mpi doesn't run the jobs on the reserved
> nodes.
> 1) How do you link maui and mpi together.
>
> 2) The problem with mpdboot, which is run on the master node only, is if a
> node restart, it doesn't join the mpdring again. Someone have an init script
> for suse with mpd ?
What you want to do is run mpdboot and mpdshutdown inside of your job script using
$PBS_NODEFILE as the host list.
Or consider using the replacement mpiexec (it actually predates the 'mpiexec'
command from mpich2) from http://www.osc.edu/~pw/mpiexec because it integrates
tightly with the pbs_mom daemons. This replaces mpdboot, mpdshutdown, mpirun,
and mpich2's mpiexec.
This subject is lightly touched on here:
http://www.clusterresources.com/products/torque/docs/3.2mpi.shtml
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20050706/8b91dada/attachment.bin
More information about the mauiusers
mailing list