[torquedev] nodes, procs, tpn and ncpus
glen.beane at gmail.com
Wed Jun 9 07:45:27 MDT 2010
On Wed, Jun 9, 2010 at 8:57 AM, Ken Nielson
<knielson at adaptivecomputing.com> wrote:
> Currently when TORQUE is asked to run a job with qrun it interprets the nodes=x as only a single node. Glen, if you look at listelem and node_spec you will see this is the case. TORQUE also ignores procs and ncpus.
If I stop Moab on my cluster and run jobs with qrun nodes=x is not
interpreted as a single node. It is basically interpreted as
gbeane at wulfgar:~> echo "sleep 60" | qsub -l nodes=4,walltime=00:02:00
gbeane at wulfgar:~> qrun 69795
gbeane at wulfgar:~> qstat -f 69795
exec_host = cs-prod-6/0+cs-prod-5/0+cs-prod-4/0+cs-prod-3/0
Resource_List.neednodes = 4
Resource_List.nodect = 4
Resource_List.nodes = 4
> I am going to modify TORQUE so it will process these resources more like we expect.
> procs=x will mean give me x processors anywhere.
> nodes=x will mean the same as procs=x.
I don't think this should be the case... Moab reinterprets it to mean
the same thing, but historically with PBS that is not how has been
> nodes=x:ppn=x will work as it currently does except that the value for nodes will not be ignored.
what do you mean the value for nodes will not be ignored??? The value
for nodes is NOT ignored now.
gbeane at wulfgar:~> echo "sleep 60" | qsub -l nodes=2:ppn=4,walltime=00:01:00
gbeane at wulfgar:~> qrun 69792
gbeane at wulfgar:~> qstat -f 69792
exec_host = cs-prod-2/3+cs-prod-2/2+cs-prod-2/1+cs-prod-2/0+cs-prod-1/3+cs
Resource_List.neednodes = 2:ppn=4
Resource_List.nodect = 2
Resource_List.nodes = 2:ppn=4
> That is a node spec of -l nodes=2:ppn=2 will look for two nodes with two available processors. This can be satisfied on the same host or different hosts. Currently this node spec will only get two processors on a single node.
this is _not_ true. This works fine with TORQUE, unless you guys have
broken it. People have been using nodes=2:ppn=2 since before Moab
I suggest we keep the historic meaning of nodes=X:ppn=T (ignoring how
Moab chooses to reinterpret this...). To TORQUE nodes=X should mean
exactly X number of nodes. If Moab admins CHOOSE to use
NODEMATCHPOLICY that allows this to be interpreted otherwise then that
To get the behavior you are suggesting in torque (without Moab) then
we should wait until implementing select...
> ncpus=x will allocate x processors to a single task. They must be on the same host.
sounds good. I swear this used to work with qrun/pbsched. I see
that for resources I am given nodes=1 ncpus=X, but I am only allocated
one virtual processor by torque
> tpn can be used like ppn and it will be interpreted to mean use exactly x processors from each node.
> A node spec of nodes=2:tpn=2 will allocate two processors on one node and two processors on a separate node.
> I am interested in your input.
lets not introduce anything new until we have the select statement.
lets keep the historic meaning of these things. Ignore how Moab can
be configured to reinterpret them. People use TORQUE without Moab.
Fix things that are completely ignored by TORQUE, wait for select to
implement new behaviors.
> torquedev at supercluster.org
More information about the torquedev