[torqueusers] qsub and mpiexec -f machinefile

Tiago Silva (Cefas) tiago.silva at cefas.co.uk
Thu Feb 20 02:51:24 MST 2014


Thanks, this seems promising. Before I try building with openmpi, if I parse PBS_NODEFILE to produce my own machinefile for mpiexec, for instance following my previous example:

n100
n100
n101
n101
n101
n101

won't mpiexec start mpi  processes with ranks 0-1 onto n100 and with rank 2-5 on n101? That what I think it does when I don't use qsub.

Tiago

> -----Original Message-----
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
> bounces at supercluster.org] On Behalf Of Gus Correa
> Sent: 19 February 2014 15:11
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] qsub and mpiexec -f machinefile
> 
> Hi Tiago
> 
> The Torque/PBS node file is available to your job script through the
> environmnent variable $PBS_NODEFILE.
> This file has one line listing the node name for each processor/core
> that you requested.
> Just do a "cat $PBS_NODEFILE" inside your job script to see how it
> looks.
> Inside your job script, and before the mpiexec command, you can run a
> brief auxiliary script to create the machinefile you need from the the
> $PBS_NODEFILE.
> You will need to create this auxiliary script, tailored to your
> application.
> Still, this method won't bind the MPI processes to the appropriate
> hardware components (cores, sockets, etc), (in case this is also part
> of your goal).
> 
> Having said that, if you are using OpenMPI, it can be built with Torque
> support (with the --with-tm=/torque/location configuration option).
> This would give you a range of options on how to assign different
> cores, sockets, etc, to different MPI ranks/processes, directly in the
> mpiexec command, or in the OpenMPI runtime configuration files.
> This method would't require creating the machinefile from the
> PBS_NODEFILE.
> This second approach has the advantage of allowing you to bind the
> processes to cores, sockets, etc.
> 
> I hope this helps,
> Gus Correa
> 
> n 02/19/2014 07:40 AM, Tiago Silva (Cefas) wrote:
> > Hi,
> >
> > My MPI code is normally executed across a set of nodes with something
> like:
> >
> > mpiexec -f machinefile -np 6 ./bin
> >
> > where the machinefile has 6 entries with node names, for instance:
> >
> > n01
> >
> > n01
> >
> > n02
> >
> > n02
> >
> > n02
> >
> > n02
> >
> > Now the issue here is that this list has been optimised to balance
> the
> > load between nodes and to reduce internode communication. So for
> > instance model domain tiles 0 and 1 will run on n01 while tiles 2 to
> 5
> > will run on n02.
> >
> > Is there a way to integrate this into qsub since I don't know which
> > nodes will be assigned before submission? Or in other words can I
> > control grouping processes in one node?
> >
> > In my example I used 6 processes for simplicity but normally I
> > parallelise across 4-16 nodes and >100 processes.
> >
> > Thanks,
> >
> > tiago
> >
> >
> >
> >
> >
> > This email and any attachments are intended for the named recipient
> > only. Its unauthorised use, distribution, disclosure, storage or
> > copying is not permitted. If you have received it in error, please
> > destroy all copies and notify the sender. In messages of a
> > non-business nature, the views and opinions expressed are the
> author's
> > own and do not necessarily reflect those of Cefas. Communications on
> > Cefas' computer systems may be monitored and/or recorded to secure
> the
> > effective operation of the system and for other lawful purposes.
> >
> >
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted.
If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own
and do not necessarily reflect those of Cefas. 
Communications on Cefas’ computer systems may be monitored and/or recorded to secure the effective operation of the system and for other lawful purposes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140220/c3fffa9c/attachment.html 


More information about the torqueusers mailing list