[torquedev] nodes, procs, tpn and ncpus

Garrick Staples garrick at usc.edu
Thu Jun 10 12:43:26 MDT 2010


On Thu, Jun 10, 2010 at 11:27:01AM -0700, Martin Siegert alleged:
> On Wed, Jun 09, 2010 at 06:01:31PM -0700, Garrick Staples wrote:
> > On Wed, Jun 09, 2010 at 08:52:08PM -0400, Glen Beane alleged:
> > > On Wed, Jun 9, 2010 at 8:31 PM, Garrick Staples <garrick at usc.edu> wrote:
> > > > I know I'm getting in on this conversation late, but here is my fantasy:
> > > >
> > > > nodes=X gives X number of cpus. Packed. Your job is CPU bound and you don't
> > > > care how they are packed.
> > > 
> > > blah.  that is overloading the meaning of nodes.  I like the new
> > > procs=X instead. It basically means the same thing,  you get X
> > > processors, moab seems to pack them on as few nodes as possible.
> > > TORQUE doesn't do anything with procs yet...
> > 
> > Nothing is overloaded. "nodes" has always translated to "vnodes" inside of
> > torque. If you don't specify ppn, then you don't care about where your
> > processors land. Perfectly logical. This case also covers the vast majority of
> > jobs.
> 
> I am with Glen: nodes=X is just an abbreviation for nodes=X:ppn=1 - it alwasy
> has been that way. That ppn=1 means "packed" is totally counterintuitive
> - none of our users ever understood this this way. We were actually forced
> to set EXACTNODE because that is the syntax users expect from specifying
> processors-per-node. This is not about what we like, but about a sensible
> user interface that is intuitive for users. Giving a user 5 processors on
> the same node when specifying ppn=1 is not what users expect.

Other than "nodes=X is just an abbreviation for nodes=X:ppn=1", you just
agreed with me. nodes=X:ppn=Y should not be packed.



 
> > > > nodes=X:ppn=Y gives you X unique nodes with Y cpus per machine. Not-packed.
> 
> This has not been that way: nodes=X:ppn=Y gave you any multiple of Y cpus
> on a node, i.e., packed (and this includes the nodes=X (= nodes=X:ppn=1)
> case.

You agree with me again! You and your users want it the way I said, e.g.
EXACTNODE.


 
> > > > This lets you spread IO around because you know you need it.
> > > 
> > > 
> > > 
> > > here is what I want
> > > 
> > > procs=X gives you X processors, user doesn't care about layout (hack that
> > > works with Moab, should be made to work properly with pbs_sched/qrun)
> > > nodes=X:ppn=Y gives you exactly X unique nodes with Y processors per node
> > > nodes=X - I'm not sure about this one, but to preserve historic behavior I
> > > think TORQUE should give you X nodes with one processor on each node (Moab
> > > can have an option to treat it like procs=X, which is the current behavior)
> > > _______________________________________________ torquedev mailing list
> 
> Agreed. This is what we want as well.

So we are in agreement that "nodes=X:ppn=Y" should not be packed. Great.

 
> > Getting torque to jive procs with nodes is a lot more work.
> > 
> > My plan is easy, simple, and I think covers everyone's use cases.
> 
> It does not cover our use cases. Furthermore, having ppn not mean
> processors-per-node results in a never ending support problem.

Here you lost me. You kept agreeing with me that "nodes=X:ppn=Y" should not be
packed, but this doesn't cover your uses?

 
> > Everyone has always wanted "gimme X cores, anywhere". The solution is to not
> > use EXACTNODE and "nodes=X" does what you want. But EXACTNODE breaks the
> > "nodes=X:ppn=y" case. If we just change maui/moab to not pack jobs with ppn,
> > then we are done.
> 
> That is not a solution. If we not set EXACTNODE, then users who need
> nodes=N:ppn=1 (in its very meaning, namely exactly one processor per
> node) cannot be satisfied. And if we do set EXACTNODE, there is no way
> (other than procs) to request N processors anywhere. This is the reason
> why procs was introduced in the first place: so that we can set EXACTNODE
> and satisfy both type of requests.

It is *proposed* solution. It doesn't exist today Code in maui/moab would have
to be written.

EXACTNODE behaviour for "nodes=X:ppn=Y", but not for "nodes".

My proposal requires no changes in torque, very minor changes in maui/moab, and
little user re-education because they already know the word "nodes".

The only place where we disagree is that you want to use "procs=X" where I want
to use "nodes=X". I see 2 major downsides: lots of coding work in torque, and
more confusing semantics with mixed (what does "-l nodes=X,procs=Y" mean?)

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Life is Good!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20100610/8d73a9c1/attachment.bin 


More information about the torquedev mailing list