[torqueusers] specifying nodes for MPI jobs on small cluster

Andrew Dawson dawson at atm.ox.ac.uk
Thu Feb 7 13:54:11 MST 2013


Hi Gus,

Yes I can do that. What I would like to do is be able to have users request
the number of CPUs for an MPI job and not have to care how these CPUs are
distributed across physical nodes. If I do

#PBS -l nodes=1:ppn=8

then this will mean the job has to wait until there are 8 CPUs on one
physical node before starting, correct?

>From the torque documentation, it seems to say I can do:

#PBS -l nodes=8

and this will be interpreted as 8 CPUs rather than 8 physical nodes. This
is what I want. Unfortunately I get the error message at submission time
saying there are not enough resources to fulfill this request, even though
there are 33 CPUs in the system. If on my system I do

#PBS -l nodes=5

then my MPI job gets sent to 5 CPUs, not necessarily on the same physical
node, which is great and exactly what I want. I would therefore expect this
to work for larger numbers but it seems that at submission time the request
is checked against the number of physical nodes rather than virtual
processors, meaning I cannot do this! It is quite frustrating.

Please ask if there is further clarification I can make.

Andrew


On 7 February 2013 19:28, Gus Correa <gus at ldeo.columbia.edu> wrote:

> Hi Andrew
>
> Not sure I understood what exactly you want to do,
> but have you tried this?
>
> #PBS -l nodes=1:ppn=8
>
>
> It will request one node with 8 processors.
>
> I hope this helps,
> Gus Correa
>
> On 02/07/2013 11:38 AM, Andrew Dawson wrote:
> > Nodes file looks like this:
> >
> > cirrus np=1
> > cirrus1 np=8
> > cirrus2 np=8
> > cirrus3 np=8
> > cirrus4 np=8
> >
> > On 7 Feb 2013 16:25, "Ricardo Román Brenes" <roman.ricardo at gmail.com
> > <mailto:roman.ricardo at gmail.com>> wrote:
> >
> >     hi!
> >
> >     How does your node config file looks like?
> >
> >     On Thu, Feb 7, 2013 at 3:10 AM, Andrew Dawson <dawson at atm.ox.ac.uk
> >     <mailto:dawson at atm.ox.ac.uk>> wrote:
> >
> >         Hi all,
> >
> >         I'm configuring a recent torque/maui installation and I'm having
> >         trouble with submitting MPI jobs. I would like for MPI jobs to
> >         specify the number of processors they require and have those
> >         come from any available physical machine, the users shouldn't
> >         need to specify processors per node etc.
> >
> >         The torque manual says that the nodes option is mapped to
> >         virtual processors, so for example:
> >
> >              #PBS -l nodes=8
> >
> >         should request 8 virtual processors. The problem I'm having is
> >         that our cluster currently has only 5 physical machines (nodes),
> >         and setting nodes to anything greater than 5 gives the error:
> >
> >              qsub: Job exceeds queue resource limits MSG=cannot locate
> >         feasible nodes (nodes file is empty or all systems are busy)
> >
> >         I'm confused by this, we have 33 virtual processors available
> >         across the 5 nodes (4 8-core machines and one single core) so my
> >         interpretation of the manual is that I should be able to request
> >         8 nodes, since these should be understood as virtual processors?
> >         Am I doing something wrong?
> >
> >         I tried setting
> >
> >         #PBS -l procs=8
> >
> >         but that doesn't seem to do anything, MPI stops due to having
> >         only 1 worker available (single core allocated to the job).
> >
> >         Thanks,
> >         Andrew
> >
> >         p.s.
> >
> >         The queue I'm submitting jobs to is defined as:
> >
> >         create queue normal
> >         set queue normal queue_type = Execution
> >         set queue normal resources_min.cput = 12:00:00
> >         set queue normal resources_default.cput = 24:00:00
> >         set queue normal disallowed_types = interactive
> >         set queue normal enabled = True
> >         set queue normal started = True
> >
> >         and we are using torque version 2.5.12 and we are using maui
> >         3.3.1 for scheduling
> >
> >
> >         _______________________________________________
> >         torqueusers mailing list
> >         torqueusers at supercluster.org <mailto:
> torqueusers at supercluster.org>
> >         http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >
> >     _______________________________________________
> >     torqueusers mailing list
> >     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> >     http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
Dr Andrew Dawson
Atmospheric, Oceanic & Planetary Physics
Clarendon Laboratory
Parks Road
Oxford OX1 3PU, UK
Tel: +44 (0)1865 282438
Email: dawson at atm.ox.ac.uk
Web Site: http://www2.physics.ox.ac.uk/contacts/people/dawson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130207/69e2ee30/attachment-0001.html 


More information about the torqueusers mailing list