[torquedev] nodes, procs, tpn and ncpus

Martin Siegert siegert at sfu.ca
Thu Jun 10 13:12:19 MDT 2010


Hi Garrick,

On Thu, Jun 10, 2010 at 11:43:26AM -0700, Garrick Staples wrote:
> On Thu, Jun 10, 2010 at 11:27:01AM -0700, Martin Siegert alleged:
> > On Wed, Jun 09, 2010 at 06:01:31PM -0700, Garrick Staples wrote:
> > > On Wed, Jun 09, 2010 at 08:52:08PM -0400, Glen Beane alleged:
> > > > On Wed, Jun 9, 2010 at 8:31 PM, Garrick Staples <garrick at usc.edu> wrote:
> > > > > I know I'm getting in on this conversation late, but here is my fantasy:
> > > > >
> > > > > nodes=X gives X number of cpus. Packed. Your job is CPU bound and you don't
> > > > > care how they are packed.
> > > > 
> > > > blah.  that is overloading the meaning of nodes.  I like the new
> > > > procs=X instead. It basically means the same thing,  you get X
> > > > processors, moab seems to pack them on as few nodes as possible.
> > > > TORQUE doesn't do anything with procs yet...
> > > 
> > > Nothing is overloaded. "nodes" has always translated to "vnodes" inside of
> > > torque. If you don't specify ppn, then you don't care about where your
> > > processors land. Perfectly logical. This case also covers the vast majority of
> > > jobs.
> > 
> > I am with Glen: nodes=X is just an abbreviation for nodes=X:ppn=1 - it alwasy
> > has been that way. That ppn=1 means "packed" is totally counterintuitive
> > - none of our users ever understood this this way. We were actually forced
> > to set EXACTNODE because that is the syntax users expect from specifying
> > processors-per-node. This is not about what we like, but about a sensible
> > user interface that is intuitive for users. Giving a user 5 processors on
> > the same node when specifying ppn=1 is not what users expect.
> 
> Other than "nodes=X is just an abbreviation for nodes=X:ppn=1", you just
> agreed with me. nodes=X:ppn=Y should not be packed.

I agree - I misunderstood your proposal - much relieved :-)

> > > > > nodes=X:ppn=Y gives you X unique nodes with Y cpus per machine. Not-packed.
> > 
> > This has not been that way: nodes=X:ppn=Y gave you any multiple of Y cpus
> > on a node, i.e., packed (and this includes the nodes=X (= nodes=X:ppn=1)
> > case.
> 
> You agree with me again! You and your users want it the way I said, e.g.
> EXACTNODE.

agreed.

> > > > > This lets you spread IO around because you know you need it.
> > > > 
> > > > 
> > > > 
> > > > here is what I want
> > > > 
> > > > procs=X gives you X processors, user doesn't care about layout (hack that
> > > > works with Moab, should be made to work properly with pbs_sched/qrun)
> > > > nodes=X:ppn=Y gives you exactly X unique nodes with Y processors per node
> > > > nodes=X - I'm not sure about this one, but to preserve historic behavior I
> > > > think TORQUE should give you X nodes with one processor on each node (Moab
> > > > can have an option to treat it like procs=X, which is the current behavior)
> > > > _______________________________________________ torquedev mailing list
> > 
> > Agreed. This is what we want as well.
> 
> So we are in agreement that "nodes=X:ppn=Y" should not be packed. Great.
> 
>  
> > > Getting torque to jive procs with nodes is a lot more work.
> > > 
> > > My plan is easy, simple, and I think covers everyone's use cases.
> > 
> > It does not cover our use cases. Furthermore, having ppn not mean
> > processors-per-node results in a never ending support problem.
> 
> Here you lost me. You kept agreeing with me that "nodes=X:ppn=Y" should not be
> packed, but this doesn't cover your uses?

I missed your point that you want to break the meaning that nodes=X is
just a shorthand for nodes=X:ppn=1.
While our users are used to nodes=X (in its current meaning) I am not
too much concerned about our current users: nodes=X is not too common
usage after procs was introduced. However, I am concerned about new
users: intuitively nodes=X requests nodes, not processors, thus there
is again the potential to create a support nightmare.
Furthermore, most of our users by now use procs. Eliminating procs
would be a big problem.

> > > Everyone has always wanted "gimme X cores, anywhere". The solution is to not
> > > use EXACTNODE and "nodes=X" does what you want. But EXACTNODE breaks the
> > > "nodes=X:ppn=y" case. If we just change maui/moab to not pack jobs with ppn,
> > > then we are done.
> > 
> > That is not a solution. If we not set EXACTNODE, then users who need
> > nodes=N:ppn=1 (in its very meaning, namely exactly one processor per
> > node) cannot be satisfied. And if we do set EXACTNODE, there is no way
> > (other than procs) to request N processors anywhere. This is the reason
> > why procs was introduced in the first place: so that we can set EXACTNODE
> > and satisfy both type of requests.
> 
> It is *proposed* solution. It doesn't exist today Code in maui/moab would have
> to be written.
> 
> EXACTNODE behaviour for "nodes=X:ppn=Y", but not for "nodes".
> 
> My proposal requires no changes in torque, very minor changes in maui/moab, and
> little user re-education because they already know the word "nodes".
> 
> The only place where we disagree is that you want to use "procs=X" where I want
> to use "nodes=X". I see 2 major downsides: lots of coding work in torque, and
> more confusing semantics with mixed (what does "-l nodes=X,procs=Y" mean?)

(pardon my ignorance - I am not an expert in the intricacies of the torque
code). Why is implementing nodes=X (in your new meaning) so much easier
than procs=X? This is "just" the name of a resource, can this not be mapped
when the resource requests are parsed?
With respect to the meaning of nodes=X,procs=Y (old interpretation):
give me X nodes with exactly one processor on each node plus Y processors
anywhere (i.e., none of the Y procs must end up on the X nodes; the request
is additive).

- Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid Site Lead
IT Services                                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: siegert at sfu.ca
Canada  V5A 1S6


More information about the torquedev mailing list