[torquedev] Allocating resources by CPUs

Martin Siegert siegert at sfu.ca
Mon Mar 19 12:04:16 MDT 2007

On Mon, Mar 19, 2007 at 12:18:36PM -0400, Glen Beane wrote:
> On 3/19/07, Joshua Bernstein <bjosh at lpl.arizona.edu> wrote:
> >On Mar 18, 2007, at 9:23 PM, Chris Samuel wrote:
> >> On Mon, 19 Mar 2007, Joshua Bernstein wrote:
> >>> On Mar 18, 2007, at 7:56 PM, Chris Samuel wrote:
> >>>> On Tue, 13 Mar 2007, Glen Beane wrote:
> >>>>> I agree that we should add a '-l cpus=x'  option to torque
> >>> Forgive me for being out of the loop and perhaps not following the
> >>> threads as closely as possible, but isn't there an -l ncpus=x option
> >>> in Torque?
> >> There is, but that only determines how many CPUs on an SMP node you
> >> want, not
> >> how many CPUs across the whole cluster you may need.
> >You figure out how many CPUs you want across the the entire cluster
> >by simply multiplying the nodes times the ppn. Otherwise simply
> >notating nodes=x says that you want x number of cpus, since the nodes
> >argument specifies the number of default number of "virtual cpus",
> >not nodes (as is often confused). Adding another confusing option
> >seems a bit excessive. People are already very well confused between
> >the difference of nodes, ncpus, and ppn and how they all related.
> >On that note, both the nodes, and ppn arguments are nicely documented
> >(including the virtual host idea), though ncpus wasn't listed last I
> >looked at the Torque documentation.
> we've already been over this
> the reason we are talking about yet another option is some users want
> the job to run exactly how they specify, and others just want it to
> run on X number of CPUs as quickly as possible and they don't care how
> many nodes are used
> you can not satisfy both users with maui/moab and torque.  To get
> something like -l nodes=64 and have it run on any combination of nodes
> that equals 64 CPUs to work you have to disable the exact node match
> exact policy
> It would be acceptible to me if there were a maui and moab option that
> would enforce the exact node match policy if :ppn is specified,
> otherwise it wil provide any distribution of CPUs across whatever
> number of nodes are needed to satisfy the request

The problem that I have with this solution is that it is a continuing
cause of user confusion and consequently has become a serious
support problem:

in the specification "-l nodes=x:ppn=y" x specifies the number of
nodes; in the specification specification "-l nodes=x" x specifies the
number of cpus, not nodes. For obvious reasons I have problems explaining
this to users; it does not make sense.

That's why I am in favour of implementing "-l ncpus=x" for clusters.
The way ncpus currently works on clusters (as far as I can tell it is
effectively equivalent to -l nodes=1:ppn=x) is useless anyway.
If really needed, one could create a configure option
to allow sysadmins to get the old behaviour - with the disadvantage that
you have to continue to support the old code.


Martin Siegert
Head, HPC at SFU
WestGrid Site Lead
Academic Computing Services                phone: (604) 291-4691
Simon Fraser University                    fax:   (604) 291-4242
Burnaby, British Columbia                  email: siegert at sfu.ca
Canada  V5A 1S6

