[torquedev] nodes, procs, tpn and ncpus

"Mgr. Šimon Tóth" SimonT at mail.muni.cz
Wed Jun 9 08:39:31 MDT 2010


Dne 9.6.2010 16:34, Glen Beane napsal(a):
> On Wed, Jun 9, 2010 at 10:24 AM, Bas van der Vlies <basv at sara.nl> wrote:
>> On 09-06-10 16:00, Glen Beane wrote:
>>> On Wed, Jun 9, 2010 at 9:45 AM, Glen Beane<glen.beane at gmail.com>  wrote:
>>>> On Wed, Jun 9, 2010 at 8:57 AM, Ken Nielson
>>>> <knielson at adaptivecomputing.com>  wrote:
>>>>> Currently when TORQUE is asked to run a job with qrun it interprets the nodes=x as only a single node. Glen, if you look at listelem and node_spec you will see this is the case. TORQUE also ignores procs and ncpus.
>>>>
>>>>
>>>> If I stop Moab on my cluster and run jobs with qrun nodes=x is not
>>>> interpreted as a single node.  It is basically interpreted as
>>>> nodes=x:ppn=1
>>>>
>>>> gbeane at wulfgar:~>  echo "sleep 60" | qsub -l nodes=4,walltime=00:02:00
>>>> 69795.wulfgar.jax.org
>>>> gbeane at wulfgar:~>  qrun 69795
>>>> gbeane at wulfgar:~>  qstat -f 69795
>>>> ...
>>>>     exec_host = cs-prod-6/0+cs-prod-5/0+cs-prod-4/0+cs-prod-3/0
>>>> ...
>>>>     Resource_List.neednodes = 4
>>>>     Resource_List.nodect = 4
>>>>     Resource_List.nodes = 4
>>>>
>>>
>>> by the way,  I was just looking through an old OpenPBS manual and from
>>> what I could tell if you requested a number of nodes without
>>> specifying ppn you would be given complete nodes. I don't have a
>>> problem with TORQUE assuming ppn=1 if it is not specified though.
>>
>>  From the openpbs v2_2.ers.pdf manual:
>> {{{
>> * nodes:  Number and/or type of nodes to be reserved for exclusive use by
>> the job.
>> . To ask for 12 nodes of any type: -l nodes=12
>> . To ask for 2 processors on each of four nodes:
>>   -l nodes=4:ppn=2
>> . To ask for 4 processors on one node:
>>   -l nodes=1:ppn=4
>> }}}
>>
> 
> 
> nodes hasn't meant exclusive in a long time, I don't suggest we revert
> back to that

It is something that should be definitely achievable.

Exclusive access to node is a very common request (for I/O, network,
memory heavy jobs). The workarounds (when you request all nodes
resources) simply don't work in a heterogeneous environment.

I have currently all nodes configured to have only one process so they
actually correctly interpret the shared/exclusive states. Luckily that's
OK for me cause I have a patch for generic resources on nodes, which
handles the rest (cpus, memory, etc...).

-- 
Mgr. Šimon Tóth

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3366 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20100609/cee71385/attachment-0001.bin 


More information about the torquedev mailing list