[torqueusers] specifying nodes for MPI jobs on small cluster

Gus Correa gus at ldeo.columbia.edu
Thu Feb 7 14:40:58 MST 2013


Hi Andrew

I never got much luck with procs=YZ,
which is likely to be the syntax that matches what you want to do.
Maui (the scheduler I use) seems not to understand that
syntax very well.

I wouldn't rely completely on the Torque documentation.
It has good guidelines, but may have mistakes in the details.
Trial and error may be the way to check what works for you.
I wonder if the error message you see may come
from different interpretations given to the word "node"
by the torque server (pbs_server) and the scheduler (which
maybe Maui, pbs_sched or perhaps Moab).

If you want also to control to which nodes
(and sockets and cores) each MPI *process* is sent to,
I suggest that you build OpenMPI with Torque support.
OpenMPI when built with Torque support
will use the nodes and processors assigned
by Torque to that job,
but you can still decide how the sockets and
cores are distributed among the various MPI processes,
through switches to mpiexec such as --bynode, --bysocket,
--bycore, or even finer control through their "rankfiles".

I hope this helps,
Gus Correa

On 02/07/2013 03:54 PM, Andrew Dawson wrote:
> Hi Gus,
>
> Yes I can do that. What I would like to do is be able to have users
> request the number of CPUs for an MPI job and not have to care how these
> CPUs are distributed across physical nodes. If I do
>
> #PBS -l nodes=1:ppn=8
>
> then this will mean the job has to wait until there are 8 CPUs on one
> physical node before starting, correct?
>
>  From the torque documentation, it seems to say I can do:
>
> #PBS -l nodes=8
>
> and this will be interpreted as 8 CPUs rather than 8 physical nodes.
> This is what I want. Unfortunately I get the error message at submission
> time saying there are not enough resources to fulfill this request, even
> though there are 33 CPUs in the system. If on my system I do
>
> #PBS -l nodes=5
>
> then my MPI job gets sent to 5 CPUs, not necessarily on the same
> physical node, which is great and exactly what I want. I would therefore
> expect this to work for larger numbers but it seems that at submission
> time the request is checked against the number of physical nodes rather
> than virtual processors, meaning I cannot do this! It is quite frustrating.
>
> Please ask if there is further clarification I can make.
>
> Andrew
>
>
> On 7 February 2013 19:28, Gus Correa <gus at ldeo.columbia.edu
> <mailto:gus at ldeo.columbia.edu>> wrote:
>
>     Hi Andrew
>
>     Not sure I understood what exactly you want to do,
>     but have you tried this?
>
>     #PBS -l nodes=1:ppn=8
>
>
>     It will request one node with 8 processors.
>
>     I hope this helps,
>     Gus Correa
>
>     On 02/07/2013 11:38 AM, Andrew Dawson wrote:
>      > Nodes file looks like this:
>      >
>      > cirrus np=1
>      > cirrus1 np=8
>      > cirrus2 np=8
>      > cirrus3 np=8
>      > cirrus4 np=8
>      >
>      > On 7 Feb 2013 16:25, "Ricardo Román Brenes"
>     <roman.ricardo at gmail.com <mailto:roman.ricardo at gmail.com>
>      > <mailto:roman.ricardo at gmail.com
>     <mailto:roman.ricardo at gmail.com>>> wrote:
>      >
>      >     hi!
>      >
>      >     How does your node config file looks like?
>      >
>      >     On Thu, Feb 7, 2013 at 3:10 AM, Andrew Dawson
>     <dawson at atm.ox.ac.uk <mailto:dawson at atm.ox.ac.uk>
>      > <mailto:dawson at atm.ox.ac.uk <mailto:dawson at atm.ox.ac.uk>>> wrote:
>      >
>      >         Hi all,
>      >
>      >         I'm configuring a recent torque/maui installation and I'm
>     having
>      >         trouble with submitting MPI jobs. I would like for MPI
>     jobs to
>      >         specify the number of processors they require and have those
>      >         come from any available physical machine, the users shouldn't
>      >         need to specify processors per node etc.
>      >
>      >         The torque manual says that the nodes option is mapped to
>      >         virtual processors, so for example:
>      >
>      >              #PBS -l nodes=8
>      >
>      >         should request 8 virtual processors. The problem I'm
>     having is
>      >         that our cluster currently has only 5 physical machines
>     (nodes),
>      >         and setting nodes to anything greater than 5 gives the error:
>      >
>      >              qsub: Job exceeds queue resource limits MSG=cannot
>     locate
>      >         feasible nodes (nodes file is empty or all systems are busy)
>      >
>      >         I'm confused by this, we have 33 virtual processors available
>      >         across the 5 nodes (4 8-core machines and one single
>     core) so my
>      >         interpretation of the manual is that I should be able to
>     request
>      >         8 nodes, since these should be understood as virtual
>     processors?
>      >         Am I doing something wrong?
>      >
>      >         I tried setting
>      >
>      >         #PBS -l procs=8
>      >
>      >         but that doesn't seem to do anything, MPI stops due to having
>      >         only 1 worker available (single core allocated to the job).
>      >
>      >         Thanks,
>      >         Andrew
>      >
>      >         p.s.
>      >
>      >         The queue I'm submitting jobs to is defined as:
>      >
>      >         create queue normal
>      >         set queue normal queue_type = Execution
>      >         set queue normal resources_min.cput = 12:00:00
>      >         set queue normal resources_default.cput = 24:00:00
>      >         set queue normal disallowed_types = interactive
>      >         set queue normal enabled = True
>      >         set queue normal started = True
>      >
>      >         and we are using torque version 2.5.12 and we are using maui
>      >         3.3.1 for scheduling
>      >
>      >
>      >         _______________________________________________
>      >         torqueusers mailing list
>      > torqueusers at supercluster.org
>     <mailto:torqueusers at supercluster.org>
>     <mailto:torqueusers at supercluster.org
>     <mailto:torqueusers at supercluster.org>>
>      > http://www.supercluster.org/mailman/listinfo/torqueusers
>      >
>      >
>      >
>      >     _______________________________________________
>      >     torqueusers mailing list
>      > torqueusers at supercluster.org
>     <mailto:torqueusers at supercluster.org>
>     <mailto:torqueusers at supercluster.org
>     <mailto:torqueusers at supercluster.org>>
>      > http://www.supercluster.org/mailman/listinfo/torqueusers
>      >
>      >
>      >
>      > _______________________________________________
>      > torqueusers mailing list
>      > torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>      > http://www.supercluster.org/mailman/listinfo/torqueusers
>
>     _______________________________________________
>     torqueusers mailing list
>     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>     http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
>
> --
> Dr Andrew Dawson
> Atmospheric, Oceanic & Planetary Physics
> Clarendon Laboratory
> Parks Road
> Oxford OX1 3PU, UK
> Tel: +44 (0)1865 282438
> Email: dawson at atm.ox.ac.uk <mailto:dawson at atm.ox.ac.uk>
> Web Site: http://www2.physics.ox.ac.uk/contacts/people/dawson
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list