[torquedev] Allocating resources by CPUs

Glen Beane glen.beane at gmail.com
Mon Mar 19 12:08:17 MDT 2007


On 3/19/07, Martin Siegert <siegert at sfu.ca> wrote:
> On Mon, Mar 19, 2007 at 12:18:36PM -0400, Glen Beane wrote:
> > On 3/19/07, Joshua Bernstein <bjosh at lpl.arizona.edu> wrote:
> > >-----BEGIN PGP SIGNED MESSAGE-----
> > >Hash: SHA1
> > >
> > >
> > >On Mar 18, 2007, at 9:23 PM, Chris Samuel wrote:
> > >
> > >> On Mon, 19 Mar 2007, Joshua Bernstein wrote:
> > >>
> > >>> On Mar 18, 2007, at 7:56 PM, Chris Samuel wrote:
> > >>>> On Tue, 13 Mar 2007, Glen Beane wrote:
> > >>>>> I agree that we should add a '-l cpus=x'  option to torque
> > >>>
> > >>> Forgive me for being out of the loop and perhaps not following the
> > >>> threads as closely as possible, but isn't there an -l ncpus=x option
> > >>> in Torque?
> > >>
> > >> There is, but that only determines how many CPUs on an SMP node you
> > >> want, not
> > >> how many CPUs across the whole cluster you may need.
> > >
> > >You figure out how many CPUs you want across the the entire cluster
> > >by simply multiplying the nodes times the ppn. Otherwise simply
> > >notating nodes=x says that you want x number of cpus, since the nodes
> > >argument specifies the number of default number of "virtual cpus",
> > >not nodes (as is often confused). Adding another confusing option
> > >seems a bit excessive. People are already very well confused between
> > >the difference of nodes, ncpus, and ppn and how they all related.
> > >
> > >On that note, both the nodes, and ppn arguments are nicely documented
> > >(including the virtual host idea), though ncpus wasn't listed last I
> > >looked at the Torque documentation.
> >
> > we've already been over this
> >
> > the reason we are talking about yet another option is some users want
> > the job to run exactly how they specify, and others just want it to
> > run on X number of CPUs as quickly as possible and they don't care how
> > many nodes are used
> >
> > you can not satisfy both users with maui/moab and torque.  To get
> > something like -l nodes=64 and have it run on any combination of nodes
> > that equals 64 CPUs to work you have to disable the exact node match
> > exact policy
> >
> > It would be acceptible to me if there were a maui and moab option that
> > would enforce the exact node match policy if :ppn is specified,
> > otherwise it wil provide any distribution of CPUs across whatever
> > number of nodes are needed to satisfy the request
>
> The problem that I have with this solution is that it is a continuing
> cause of user confusion and consequently has become a serious
> support problem:
>
> in the specification "-l nodes=x:ppn=y" x specifies the number of
> nodes; in the specification specification "-l nodes=x" x specifies the
> number of cpus, not nodes. For obvious reasons I have problems explaining
> this to users; it does not make sense.
>
> That's why I am in favour of implementing "-l ncpus=x" for clusters.
> The way ncpus currently works on clusters (as far as I can tell it is
> effectively equivalent to -l nodes=1:ppn=x) is useless anyway.
> If really needed, one could create a configure option
> --enable-old-ncpus-syntax
> to allow sysadmins to get the old behaviour - with the disadvantage that
> you have to continue to support the old code.

I agree fixing the -l ncpus=x behavior for clusters would be less
confusing to users


More information about the torquedev mailing list