<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<meta content="text/html; charset=ISO-8859-1"
<body bgcolor="#ffffff" text="#000000">
On 06/10/2010 06:32 PM, Martin Siegert wrote:
<pre wrap="">On Thu, Jun 10, 2010 at 01:36:56PM -0600, Ken Nielson wrote:
<pre wrap="">On 06/10/2010 12:27 PM, Martin Siegert wrote:
That is not a solution. If we not set EXACTNODE, then users who need
nodes=N:ppn=1 (in its very meaning, namely exactly one processor per
node) cannot be satisfied. And if we do set EXACTNODE, there is no way
(other than procs) to request N processors anywhere. This is the reason
why procs was introduced in the first place: so that we can set EXACTNODE
and satisfy both type of requests.
<pre wrap="">You may have seen in this discussion where Simon Toth and Glen Beane
were indicating that nodes=x:ppn=y allocates y processors on x separate
nodes and I was saying that it only allocates y processors on a single
It ends up we were both right. It depends on what you have in your
serverdb configuration. I have the server parameter
resources_available.nodect set and Simon and Glen did not. Simon and
Glen were running TORQUE's default behavior and TORQUE by default
allocates nodes the same as if EXACTNODE were set in Moab.
Moab muddies the waters by giving users the option to treat processors
like nodes (vnodes in the case of PBS Pro). This is certainly one source
of the confusion that exists on the meaning of different resources.
While Moab is consistent in how it interprets the procs resource it has
ambiguity with the nodes resource. If the JOBNODEMATCHPOLICY is not set
(default) Moab treats processors as nodes. So -l nodes=x where x is
greater than the physical nodes will be treated like -l procs=x provided
TORQUE has set the available_resources.nodect parameter. By set I mean
the nodect is greater than the number of physical nodes.
After all this I just want to confirm what Martin has just written, that
is procs exists so users can allocate a job with as many processors
needed independent of the number of available nodes. We now just need
TORQUE to recognize procs as well.
just a comment: nodect used to be a parameter that was absolutely
essential in the pre-procs days when we did not set EXACTNODE:
in that configuration a nodes file with, e.g.,
would only allow you to run a job with a maximum of 200 processors
(using a -l nodes=N request). You needed to set nodect=800 to allow jobs
with -l nodes=400 or so. I always regarded nodect as an ugly workaround.
If it turns out that unsetting nodect (or eliminating nodect) plus
introducing procs basically implements the EXACTNODE + procs policies
in torque, then I believe that that is an excellent solution.
Here is the explanation of nodect from the trouble shooting section of
the TORQUE docs on the cluster resources site.
<a class="moz-txt-link-freetext" href="http://www.clusterresources.com/torquedocs/10.1troubleshooting.shtml">http://www.clusterresources.com/torquedocs/10.1troubleshooting.shtml</a> <br>
<h4>qsub will not allow the submission of jobs requesting many
TORQUE's definition of a node is context sensitive and can appear
inconsistent. The <a
'<b>-l nodes=<X></b>' expression can at times indicate a request
for <b>X</b> processors and other time be interpreted as a request for
<b>X</b> nodes. While <b>qsub</b> allows multiple interpretations of
the keyword <i>nodes</i>, aspects of the TORQUE server's logic are not
so flexible. Consequently, if a job is using '-l nodes' to specify
processor count and the requested number of processors exceeds the
available number of physical nodes, the server daemon will reject the
<p> To get around this issue, the server can be told it has an <i>inflated</i>
number of nodes using the <b>resources_available</b> attribute. To
take affect, this attribute should be set on both the server and the
associated queue as in the example below. See <a
for more information.<br>
Qmgr: set server resources_available.nodect=2048
Qmgr: set queue batch resources_available.nodect=2048
It seems this feature is where the ambiguity of nodes originates. By
default -l nodes=x directs TORQUE to allocate a processor from x
distinct nodes. nodect changes the meaning of nodes from a host to that
of a processor or virtual processor. We do not need to change this
behavior nor do we want to because there are many sites out there who
now depend on this. But we can add the procs functionality to TORQUE
and we can change the emphasis of the documentation to direct users to
use the procs keyword to just allocate processors.<br>
There are other ways in which users will want to allocate nodes and
processes but that can be taken care of in a select statement. This
discussion has only been a precursor to what we want to be able to do