[torqueusers] Problem with ppn and routing : Possible way to get the routing you want continued.

Garrick Staples garrick at usc.edu
Thu Dec 2 11:06:31 MST 2010


ncpus and nodes are competing ways to specify a resource request. Don't mix
them and everything works better.

ncpus pre-dates clusters and is used to specify the number of cpus on 1 node.
nodes was grafted into OpenPBS later in life to deal with clusters.

On Thu, Dec 02, 2010 at 09:31:40AM -0600, Coyle, James J [ITACD] alleged:
> I should have added that you also need the change that
> was previously mentioned:
> 
> qmgr -c 'set queue batch resources_max.nodes = 1'
> qmgr -c 'set queue fast resources_max.nodes = 2'
> and you could add 
> qsub -c 'set queue fast resources_max.ncpus = 4'
> 
> to effect the change for the queue fast as well.
> 
> batch should come ahead of fast in the routing queue.
> 
>  - Jim C.
> 
> >-----Original Message-----
> >From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
> >bounces at supercluster.org] On Behalf Of Coyle, James J [ITACD]
> >Sent: Thursday, December 02, 2010 9:22 AM
> >To: Torque Users Mailing List
> >Subject: Re: [torqueusers] Problem with ppn and routing : Possible
> >way to get the routing you want
> >
> >J.A. Magallon,
> >
> >   I have a suggestion for this case.
> >
> >   Create a submit filter, (or modify pbs_sched) so that
> >whenever nodes=N:ppn=P is used, then the calculation C=N*P
> >is performed and the resource request is changed so that
> >ncpus=C is added.
> >
> >   Then issue
> >qsub -c 'set queue batch resources_max.ncpus = 1'
> >
> >   Now a request of
> >
> >qsub -lnodes=1:ppn=2
> >would be changed to
> >qsub -lnodes=1:ppn=2,npcus=2
> >
> >which would be rejected by batch (because of ncpus).
> >
> >   I am running 2.3.6, and it appears that nodes=N:ppn=P
> >takes precedence over npcus, so you will still get the sort
> >of node packing you want, npcus here just serves to aid the
> >routing queue.
> >
> >- Jim Coyle
> >
> >
> > James Coyle, PhD
> > High Performance Computing Group
> > 115 Durham Center
> > Iowa State Univ.           phone: (515)-294-2099
> > Ames, Iowa 50011           web: http://www.public.iastate.edu/~jjc
> >
> >
> >
> >>-----Original Message-----
> >>From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
> >>bounces at supercluster.org] On Behalf Of J.A. Magallón
> >>Sent: Tuesday, November 30, 2010 7:29 PM
> >>To: torqueusers at supercluster.org
> >>Subject: Re: [torqueusers] Problem with ppn and routing
> >>
> >>On Tue, 30 Nov 2010 09:44:08 -0700 (MST), David Beer
> >><dbeer at adaptivecomputing.com> wrote:
> >>
> >>>
> >>>
> >>> ----- Original Message -----
> >>> > -snip-
> >>> > > set queue fast resources_max.nodes = 2:ppn=2
> >>> > -snip-
> >>> > > set queue batch resources_max.nodes = 1:ppn=1
> >>> >
> >>> > My understanding is that torque can/will only do useful
> >>comparisons on
> >>> > numeric fields so the above settings are not meaningful. You
> >>might be
> >>> > OK with resources_max.nodect (though that might not be numeric
> >>either)
> >>> > but could only filter on the number of nodes not the number of
> >>> > processes requested (and you would need a default nodes=1 which
> >>I
> >>> > would prefer not to set so we can use procs as an option...). I
> >>don't
> >>> > think this solves your problem but might point you (or others)
> >>in the
> >>> > right direction.
> >>> >
> >>> > -- Gareth
> >>>
> >>> At some point (I believe 2.5) we added the ability to use
> >>resources_max.nodes in queue limitations, but it only sorts based
> >on
> >>the number of nodes, not ppn. We couldn't sort based on ppn because
> >>of the inherent ambiguities - which is larger, nodes=1:ppn=2 or
> >>nodes=2:ppn=1 - so we only sort based on the first number there.
> >>This means that a job requesting nodes=1:ppn=2 will be accepted by
> >>the batch queue.
> >>>
> >>> Additionally, if you would like to have jobs that request
> >>nodes=2:ppn=2 and need more walltime than allowed by the fast
> >queue,
> >>you will have to create a new queue or modify the limits for fast.
> >>>
> >>
> >>OK, thanks. My idea was that a job would fit into a queue if it
> >>passed
> >>all conditions, nodes and ppn and walltime ....
> >>
> >>What do you mean with sorting ? What do you sort ?
> >>You could go probing if a job fits (wrt to ppn and nodes) in a
> >queue
> >>until
> >>you find a good one.
> >>
> >>My problem is I dont depend on wall/cpu time, but I want to do
> >>something like:
> >>- If you ask many cores per node, you can only get X nodes and your
> >>time is
> >>  lmited to *:*:* (go to queue fast)
> >>- if you ask single core processes, you can get more nodes and live
> >>longer
> >>  (go to queue batch)
> >>
> >>How could I do that ? I use pbs_sched, no MAUI/MOAB...
> >>
> >>--
> >>J.A. Magallon <jamagallon()ono!com>     \               Software is
> >>like sex:
> >>                                         \         It's better when
> >>it's free
> >>_______________________________________________
> >>torqueusers mailing list
> >>torqueusers at supercluster.org
> >>http://www.supercluster.org/mailman/listinfo/torqueusers
> >_______________________________________________
> >torqueusers mailing list
> >torqueusers at supercluster.org
> >http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Life is Good!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20101202/3373238e/attachment.bin 


More information about the torqueusers mailing list