[torquedev] nodes, procs, tpn and ncpus

Ken Nielson knielson at adaptivecomputing.com
Fri Jun 11 10:27:11 MDT 2010


On 06/10/2010 06:32 PM, Martin Siegert wrote:
> On Thu, Jun 10, 2010 at 01:36:56PM -0600, Ken Nielson wrote:
>    
>> On 06/10/2010 12:27 PM, Martin Siegert wrote:
>>      
>>> That is not a solution. If we not set EXACTNODE, then users who need
>>> nodes=N:ppn=1 (in its very meaning, namely exactly one processor per
>>> node) cannot be satisfied. And if we do set EXACTNODE, there is no way
>>> (other than procs) to request N processors anywhere. This is the reason
>>> why procs was introduced in the first place: so that we can set EXACTNODE
>>> and satisfy both type of requests.
>>>
>>> Cheers,
>>> Martin
>>>
>>>
>>>        
>> You may have seen in this discussion where Simon Toth and Glen Beane
>> were indicating that nodes=x:ppn=y allocates y processors on x separate
>> nodes and I was saying that it only allocates y processors on a single
>> node.
>>
>> It ends up we were both right. It depends on what you have in your
>> serverdb configuration. I have the server parameter
>> resources_available.nodect set and Simon and Glen did not. Simon and
>> Glen were running TORQUE's default behavior and TORQUE by default
>> allocates nodes the same as if EXACTNODE were set in Moab.
>>
>> Moab muddies the waters by giving users the option to treat processors
>> like nodes (vnodes in the case of PBS Pro). This is certainly one source
>> of the confusion that exists on the meaning of different resources.
>> While Moab is consistent in how it interprets the procs resource it has
>> ambiguity with the nodes resource. If the JOBNODEMATCHPOLICY is not set
>> (default) Moab treats processors as nodes. So -l nodes=x where x is
>> greater than the physical nodes will be treated like -l procs=x provided
>> TORQUE has set the available_resources.nodect parameter. By set I mean
>> the nodect is greater than the number of physical nodes.
>>
>> After all this I just want to confirm what Martin has just written, that
>> is procs exists so users can allocate a job with as many processors
>> needed independent of the number of available nodes. We now just need
>> TORQUE to recognize procs as well.
>>
>> Ken Nielson
>> Adaptive Computing
>>      
> just a comment: nodect used to be a parameter that was absolutely
> essential in the pre-procs days when we did not set EXACTNODE:
> in that configuration a nodes file with, e.g.,
>
> n1 np=4
> n2 np=4
> ...
> n200 np=4
>
> would only allow you to run a job with a maximum of 200 processors
> (using a -l nodes=N request). You needed to set nodect=800 to allow jobs
> with -l nodes=400 or so. I always regarded nodect as an ugly workaround.
> If it turns out that unsetting nodect (or eliminating nodect) plus
> introducing procs basically implements the EXACTNODE + procs policies
> in torque, then I believe that that is an excellent solution.
>
> Cheers,
> Martin
>
>    

Here is the explanation of nodect from the trouble shooting section of 
the TORQUE docs on the cluster resources site. 
http://www.clusterresources.com/torquedocs/10.1troubleshooting.shtml

_______________________________________________________________________________________________________________


        qsub will not allow the submission of jobs requesting many
        processors

TORQUE's definition of a node is context sensitive and can appear 
inconsistent. The qsub 
<http://www.clusterresources.com/torquedocs/commands/qsub.shtml> '*-l 
nodes=<X>*' expression can at times indicate a request for *X* 
processors and other time be interpreted as a request for *X* nodes. 
While *qsub* allows multiple interpretations of the keyword /nodes/, 
aspects of the TORQUE server's logic are not so flexible. Consequently, 
if a job is using '-l nodes' to specify processor count and the 
requested number of processors exceeds the available number of physical 
nodes, the server daemon will reject the job.

To get around this issue, the server can be told it has an /inflated/ 
number of nodes using the *resources_available* attribute. To take 
affect, this attribute should be set on both the server and the 
associated queue as in the example below. See resources_available 
<http://www.clusterresources.com/torquedocs/4.1queueconfig.shtml#resources_available> 
for more information.

>  qmgr
Qmgr: set server resources_available.nodect=2048
Qmgr: set queue batch resources_available.nodect=2048
____________________________________________________________________________________________________________________________________


It seems this feature is where the ambiguity of nodes originates. By 
default -l nodes=x directs TORQUE to allocate a processor from x 
distinct nodes. nodect changes the meaning of nodes from a host to that 
of a processor or virtual processor. We do not need to change this 
behavior nor do we want to because there are many sites out there who 
now depend on this. But we can add the procs functionality to TORQUE and 
we can change the emphasis of the documentation to direct users to use 
the procs keyword to just allocate processors.

There are other ways in which users will want to allocate nodes and 
processes but that can be taken care of in a select statement. This 
discussion has only been a precursor to what we want to be able to do 
with select.

Ken


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20100611/bbc2d196/attachment-0001.html 


More information about the torquedev mailing list