[torqueusers] qsub and mpiexec -f machinefile

Gus Correa gus at ldeo.columbia.edu
Wed Feb 19 08:11:28 MST 2014


Hi Tiago

The Torque/PBS node file is available to your job script
through the environmnent variable $PBS_NODEFILE.
This file has one line listing the node name for each processor/core
that you requested.
Just do a "cat $PBS_NODEFILE" inside your job script to see how it looks.
Inside your job script, and before the mpiexec command, you can
run a brief auxiliary script to create the machinefile
you need from the the $PBS_NODEFILE.
You will need to create this auxiliary script,
tailored to your application.
Still, this method won't bind the MPI processes to the
appropriate hardware components (cores, sockets, etc),
(in case this is also part of your goal).

Having said that, if you are using OpenMPI, it can be built with
Torque support (with the --with-tm=/torque/location configuration option).
This would give you a range of options on how to assign different
cores, sockets, etc, to different MPI ranks/processes, directly
in the mpiexec command, or in the OpenMPI runtime configuration files.
This method would't require creating the machinefile
from the PBS_NODEFILE.
This second approach has the advantage of allowing
you to bind the processes to cores, sockets, etc.

I hope this helps,
Gus Correa

n 02/19/2014 07:40 AM, Tiago Silva (Cefas) wrote:
> Hi,
>
> My MPI code is normally executed across a set of nodes with something like:
>
> mpiexec -f machinefile -np 6 ./bin
>
> where the machinefile has 6 entries with node names, for instance:
>
> n01
>
> n01
>
> n02
>
> n02
>
> n02
>
> n02
>
> Now the issue here is that this list has been optimised to balance the
> load between nodes and to reduce internode communication. So for
> instance model domain tiles 0 and 1 will run on n01 while tiles 2 to 5
> will run on n02.
>
> Is there a way to integrate this into qsub since I don’t know which
> nodes will be assigned before submission? Or in other words can I
> control grouping processes in one node?
>
> In my example I used 6 processes for simplicity but normally I
> parallelise across 4-16 nodes and >100 processes.
>
> Thanks,
>
> tiago
>
>
>
>
>
> This email and any attachments are intended for the named recipient
> only. Its unauthorised use, distribution, disclosure, storage or copying
> is not permitted. If you have received it in error, please destroy all
> copies and notify the sender. In messages of a non-business nature, the
> views and opinions expressed are the author's own and do not necessarily
> reflect those of Cefas. Communications on Cefas’ computer systems may be
> monitored and/or recorded to secure the effective operation of the
> system and for other lawful purposes.
>
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list