[torqueusers] specifying nodes for MPI jobs on small cluster

Andrew Dawson dawson at atm.ox.ac.uk
Fri Feb 8 09:04:09 MST 2013


For others who are interested, the guidance at
http://docs.adaptivecomputing.com/torque/Content/topics/11-troubleshooting/faq.htm#qsubNotAllowresolves
my particular issue, so thanks Michel!


On 7 February 2013 21:40, Gus Correa <gus at ldeo.columbia.edu> wrote:

> Hi Andrew
>
> I never got much luck with procs=YZ,
> which is likely to be the syntax that matches what you want to do.
> Maui (the scheduler I use) seems not to understand that
> syntax very well.
>
> I wouldn't rely completely on the Torque documentation.
> It has good guidelines, but may have mistakes in the details.
> Trial and error may be the way to check what works for you.
> I wonder if the error message you see may come
> from different interpretations given to the word "node"
> by the torque server (pbs_server) and the scheduler (which
> maybe Maui, pbs_sched or perhaps Moab).
>
> If you want also to control to which nodes
> (and sockets and cores) each MPI *process* is sent to,
> I suggest that you build OpenMPI with Torque support.
> OpenMPI when built with Torque support
> will use the nodes and processors assigned
> by Torque to that job,
> but you can still decide how the sockets and
> cores are distributed among the various MPI processes,
> through switches to mpiexec such as --bynode, --bysocket,
> --bycore, or even finer control through their "rankfiles".
>
> I hope this helps,
> Gus Correa
>
> On 02/07/2013 03:54 PM, Andrew Dawson wrote:
> > Hi Gus,
> >
> > Yes I can do that. What I would like to do is be able to have users
> > request the number of CPUs for an MPI job and not have to care how these
> > CPUs are distributed across physical nodes. If I do
> >
> > #PBS -l nodes=1:ppn=8
> >
> > then this will mean the job has to wait until there are 8 CPUs on one
> > physical node before starting, correct?
> >
> >  From the torque documentation, it seems to say I can do:
> >
> > #PBS -l nodes=8
> >
> > and this will be interpreted as 8 CPUs rather than 8 physical nodes.
> > This is what I want. Unfortunately I get the error message at submission
> > time saying there are not enough resources to fulfill this request, even
> > though there are 33 CPUs in the system. If on my system I do
> >
> > #PBS -l nodes=5
> >
> > then my MPI job gets sent to 5 CPUs, not necessarily on the same
> > physical node, which is great and exactly what I want. I would therefore
> > expect this to work for larger numbers but it seems that at submission
> > time the request is checked against the number of physical nodes rather
> > than virtual processors, meaning I cannot do this! It is quite
> frustrating.
> >
> > Please ask if there is further clarification I can make.
> >
> > Andrew
> >
> >
> > On 7 February 2013 19:28, Gus Correa <gus at ldeo.columbia.edu
> > <mailto:gus at ldeo.columbia.edu>> wrote:
> >
> >     Hi Andrew
> >
> >     Not sure I understood what exactly you want to do,
> >     but have you tried this?
> >
> >     #PBS -l nodes=1:ppn=8
> >
> >
> >     It will request one node with 8 processors.
> >
> >     I hope this helps,
> >     Gus Correa
> >
> >     On 02/07/2013 11:38 AM, Andrew Dawson wrote:
> >      > Nodes file looks like this:
> >      >
> >      > cirrus np=1
> >      > cirrus1 np=8
> >      > cirrus2 np=8
> >      > cirrus3 np=8
> >      > cirrus4 np=8
> >      >
> >      > On 7 Feb 2013 16:25, "Ricardo Román Brenes"
> >     <roman.ricardo at gmail.com <mailto:roman.ricardo at gmail.com>
> >      > <mailto:roman.ricardo at gmail.com
> >     <mailto:roman.ricardo at gmail.com>>> wrote:
> >      >
> >      >     hi!
> >      >
> >      >     How does your node config file looks like?
> >      >
> >      >     On Thu, Feb 7, 2013 at 3:10 AM, Andrew Dawson
> >     <dawson at atm.ox.ac.uk <mailto:dawson at atm.ox.ac.uk>
> >      > <mailto:dawson at atm.ox.ac.uk <mailto:dawson at atm.ox.ac.uk>>> wrote:
> >      >
> >      >         Hi all,
> >      >
> >      >         I'm configuring a recent torque/maui installation and I'm
> >     having
> >      >         trouble with submitting MPI jobs. I would like for MPI
> >     jobs to
> >      >         specify the number of processors they require and have
> those
> >      >         come from any available physical machine, the users
> shouldn't
> >      >         need to specify processors per node etc.
> >      >
> >      >         The torque manual says that the nodes option is mapped to
> >      >         virtual processors, so for example:
> >      >
> >      >              #PBS -l nodes=8
> >      >
> >      >         should request 8 virtual processors. The problem I'm
> >     having is
> >      >         that our cluster currently has only 5 physical machines
> >     (nodes),
> >      >         and setting nodes to anything greater than 5 gives the
> error:
> >      >
> >      >              qsub: Job exceeds queue resource limits MSG=cannot
> >     locate
> >      >         feasible nodes (nodes file is empty or all systems are
> busy)
> >      >
> >      >         I'm confused by this, we have 33 virtual processors
> available
> >      >         across the 5 nodes (4 8-core machines and one single
> >     core) so my
> >      >         interpretation of the manual is that I should be able to
> >     request
> >      >         8 nodes, since these should be understood as virtual
> >     processors?
> >      >         Am I doing something wrong?
> >      >
> >      >         I tried setting
> >      >
> >      >         #PBS -l procs=8
> >      >
> >      >         but that doesn't seem to do anything, MPI stops due to
> having
> >      >         only 1 worker available (single core allocated to the
> job).
> >      >
> >      >         Thanks,
> >      >         Andrew
> >      >
> >      >         p.s.
> >      >
> >      >         The queue I'm submitting jobs to is defined as:
> >      >
> >      >         create queue normal
> >      >         set queue normal queue_type = Execution
> >      >         set queue normal resources_min.cput = 12:00:00
> >      >         set queue normal resources_default.cput = 24:00:00
> >      >         set queue normal disallowed_types = interactive
> >      >         set queue normal enabled = True
> >      >         set queue normal started = True
> >      >
> >      >         and we are using torque version 2.5.12 and we are using
> maui
> >      >         3.3.1 for scheduling
> >      >
> >      >
> >      >         _______________________________________________
> >      >         torqueusers mailing list
> >      > torqueusers at supercluster.org
> >     <mailto:torqueusers at supercluster.org>
> >     <mailto:torqueusers at supercluster.org
> >     <mailto:torqueusers at supercluster.org>>
> >      > http://www.supercluster.org/mailman/listinfo/torqueusers
> >      >
> >      >
> >      >
> >      >     _______________________________________________
> >      >     torqueusers mailing list
> >      > torqueusers at supercluster.org
> >     <mailto:torqueusers at supercluster.org>
> >     <mailto:torqueusers at supercluster.org
> >     <mailto:torqueusers at supercluster.org>>
> >      > http://www.supercluster.org/mailman/listinfo/torqueusers
> >      >
> >      >
> >      >
> >      > _______________________________________________
> >      > torqueusers mailing list
> >      > torqueusers at supercluster.org <mailto:torqueusers at supercluster.org
> >
> >      > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >     _______________________________________________
> >     torqueusers mailing list
> >     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> >     http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >
> >
> > --
> > Dr Andrew Dawson
> > Atmospheric, Oceanic & Planetary Physics
> > Clarendon Laboratory
> > Parks Road
> > Oxford OX1 3PU, UK
> > Tel: +44 (0)1865 282438
> > Email: dawson at atm.ox.ac.uk <mailto:dawson at atm.ox.ac.uk>
> > Web Site: http://www2.physics.ox.ac.uk/contacts/people/dawson
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
Dr Andrew Dawson
Atmospheric, Oceanic & Planetary Physics
Clarendon Laboratory
Parks Road
Oxford OX1 3PU, UK
Tel: +44 (0)1865 282438
Email: dawson at atm.ox.ac.uk
Web Site: http://www2.physics.ox.ac.uk/contacts/people/dawson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130208/6b09bad4/attachment-0001.html 


More information about the torqueusers mailing list