[torquedev] nodes, procs, tpn and ncpus
Martin Siegert
siegert at sfu.ca
Wed Jun 9 08:37:37 MDT 2010
On Wed, Jun 09, 2010 at 06:57:26AM -0600, Ken Nielson wrote:
> Currently when TORQUE is asked to run a job with qrun it interprets the nodes=x as only a single node. Glen, if you look at listelem and node_spec you will see this is the case. TORQUE also ignores procs and ncpus.
>
> I am going to modify TORQUE so it will process these resources more like we expect.
>
> procs=x will mean give me x processors anywhere.
>
> nodes=x will mean the same as procs=x.
nodes=x has been totally synonymous to nodes=x:ppn=1 - our users are used
to that. Which does not mean that it can't be changed. But it is a big change.
> nodes=x:ppn=x will work as it currently does except that the value for nodes will not be ignored.
> That is a node spec of -l nodes=2:ppn=2 will look for two nodes with two available processors. This can be satisfied on the same host or different hosts. Currently this node spec will only get two processors on a single node.
we worked hard to teach our users that nodes=x:ppn=y does exactly what it
says, namely it gives you x nodes with y processors each: ppn meaning
processors-per-node, i.e., ppn=2 gives you two processors per node, not 4
or 6 or 8. Ever since procs was introduced we have been using
JOBNODEMATCHPOLICY EXACTNODE
and it would be very difficult for us if that meaning would change.
> ncpus=x will allocate x processors to a single task. They must be on the same host.
ncpus has been a source of confusion for a long time, because users do not
understand its meaning - they assume it works like procs works today.
I somewhat question the wisdom to continue with this misconception,
particularly as nodes=1:ppn=x appears to me an equivalent, but much
clearer request.
Alternatives: a) eliminate ncpus completely, b) make it work like procs.
> tpn can be used like ppn and it will be interpreted to mean use exactly x processors from each node.
> A node spec of nodes=2:tpn=2 will allocate two processors on one node and two processors on a separate node.
We have never used "tpn", what does it stand for? As mentioned above, this
would be a big change for us.
> I am interested in your input.
For me this is less about what I would like. It is more important to give
users an interface that is intuitive, e.g., having ppn mean anything other
than processors-per-node is confusing.
Cheers,
Martin
--
Martin Siegert
Head, Research Computing
WestGrid Site Lead
IT Services phone: 778 782-4691
Simon Fraser University fax: 778 782-4242
Burnaby, British Columbia email: siegert at sfu.ca
Canada V5A 1S6
More information about the torquedev
mailing list