[torquedev] nodes, procs, tpn and ncpus

Martin Siegert siegert at sfu.ca
Wed Jun 9 08:37:37 MDT 2010


On Wed, Jun 09, 2010 at 06:57:26AM -0600, Ken Nielson wrote:
> Currently when TORQUE is asked to run a job with qrun it interprets the nodes=x as only a single node. Glen, if you look at listelem and node_spec you will see this is the case. TORQUE also ignores procs and ncpus.
> 
> I am going to modify TORQUE so it will process these resources more like we expect. 
> 
> procs=x will mean give me x processors anywhere.
> 
> nodes=x will mean the same as procs=x.

nodes=x has been totally synonymous to nodes=x:ppn=1 - our users are used
to that. Which does not mean that it can't be changed. But it is a big change.

> nodes=x:ppn=x will work as it currently does except that the value for nodes will not be ignored. 
> That is a node spec of -l nodes=2:ppn=2 will look for two nodes with two available processors. This can be satisfied on the same host or different hosts. Currently this node spec will only get two processors on a single node.

we worked hard to teach our users that nodes=x:ppn=y does exactly what it
says, namely it gives you x nodes with y processors each: ppn meaning
processors-per-node, i.e., ppn=2 gives you two processors per node, not 4
or 6 or 8. Ever since procs was introduced we have been using
JOBNODEMATCHPOLICY EXACTNODE
and it would be very difficult for us if that meaning would change.

> ncpus=x will allocate x processors to a single task. They must be on the same host.

ncpus has been a source of confusion for a long time, because users do not
understand its meaning - they assume it works like procs works today.
I somewhat question the wisdom to continue with this misconception,
particularly as nodes=1:ppn=x appears to me an equivalent, but much
clearer request.
Alternatives: a) eliminate ncpus completely, b) make it work like procs.

> tpn can be used like ppn and it will be interpreted to mean use exactly x processors from each node.
> A node spec of nodes=2:tpn=2 will allocate two processors on one node and two processors on a separate node.

We have never used "tpn", what does it stand for? As mentioned above, this
would be a big change for us.

> I am interested in your input. 

For me this is less about what I would like. It is more important to give
users an interface that is intuitive, e.g., having ppn mean anything other
than processors-per-node is confusing.

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid Site Lead
IT Services                                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: siegert at sfu.ca
Canada  V5A 1S6


More information about the torquedev mailing list