[torquedev] nodes, procs, tpn and ncpus

Glen Beane glen.beane at gmail.com
Wed Jun 9 07:45:27 MDT 2010


On Wed, Jun 9, 2010 at 8:57 AM, Ken Nielson
<knielson at adaptivecomputing.com> wrote:
> Currently when TORQUE is asked to run a job with qrun it interprets the nodes=x as only a single node. Glen, if you look at listelem and node_spec you will see this is the case. TORQUE also ignores procs and ncpus.


If I stop Moab on my cluster and run jobs with qrun nodes=x is not
interpreted as a single node.  It is basically interpreted as
nodes=x:ppn=1

gbeane at wulfgar:~> echo "sleep 60" | qsub -l nodes=4,walltime=00:02:00
69795.wulfgar.jax.org
gbeane at wulfgar:~> qrun 69795
gbeane at wulfgar:~> qstat -f 69795
...
    exec_host = cs-prod-6/0+cs-prod-5/0+cs-prod-4/0+cs-prod-3/0
...
    Resource_List.neednodes = 4
    Resource_List.nodect = 4
    Resource_List.nodes = 4



> I am going to modify TORQUE so it will process these resources more like we expect.
>
> procs=x will mean give me x processors anywhere.

great

> nodes=x will mean the same as procs=x.

I don't think this should be the case... Moab reinterprets it to mean
the same thing, but historically with PBS that is not how has been
interpreted.

> nodes=x:ppn=x will work as it currently does except that the value for nodes will not be ignored.

what do you mean the value for nodes will not be ignored???  The value
for nodes is NOT ignored now.


gbeane at wulfgar:~> echo "sleep 60" | qsub -l nodes=2:ppn=4,walltime=00:01:00
69792.wulfgar.jax.org
gbeane at wulfgar:~> qrun 69792
gbeane at wulfgar:~> qstat -f 69792
...
    exec_host = cs-prod-2/3+cs-prod-2/2+cs-prod-2/1+cs-prod-2/0+cs-prod-1/3+cs
	-prod-1/2+cs-prod-1/1+cs-prod-1/0
...
    Resource_List.neednodes = 2:ppn=4
    Resource_List.nodect = 2
    Resource_List.nodes = 2:ppn=4


> That is a node spec of -l nodes=2:ppn=2 will look for two nodes with two available processors. This can be satisfied on the same host or different hosts. Currently this node spec will only get two processors on a single node.

this is _not_ true.  This works fine with TORQUE, unless you guys have
broken it.  People have been using nodes=2:ppn=2 since before Moab
even existed.

I suggest we keep the historic meaning of nodes=X:ppn=T (ignoring how
Moab chooses to reinterpret this...).  To TORQUE nodes=X should mean
exactly X number of nodes.  If Moab admins CHOOSE to use
NODEMATCHPOLICY that allows this to be interpreted otherwise then that
is fine.

To get the behavior you are suggesting in torque (without Moab) then
we should wait until implementing select...


> ncpus=x will allocate x processors to a single task. They must be on the same host.

sounds good.  I swear this used to work with qrun/pbsched.   I see
that for resources I am given nodes=1 ncpus=X, but I am only allocated
one virtual processor by torque


> tpn can be used like ppn and it will be interpreted to mean use exactly x processors from each node.
> A node spec of nodes=2:tpn=2 will allocate two processors on one node and two processors on a separate node.
>
> I am interested in your input.

lets not introduce anything new until we have the select statement.

lets keep the historic meaning of these things.  Ignore how Moab can
be configured to reinterpret them.  People use TORQUE without Moab.
Fix things that are completely ignored by TORQUE,  wait for select to
implement new behaviors.



> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
>


More information about the torquedev mailing list