[torquedev] nodes, procs, tpn and ncpus

"Mgr. Šimon Tóth" SimonT at mail.muni.cz
Wed Jun 9 09:21:45 MDT 2010


Dne 9.6.2010 17:00, Ken Nielson napsal(a):
> On 06/09/2010 08:05 AM, "Mgr. Šimon Tóth" wrote:
>>> Currently when TORQUE is asked to run a job with qrun it interprets
>>> the nodes=x as only a single node. Glen, if you look at listelem and
>>> node_spec you will see this is the case. TORQUE also ignores procs
>>> and ncpus.
>>>      
>> Torque ignores any resource request (-l resource=value) when determining
>> where will a job run.
>>
>> Torque also ignores any such request inside nodespec with the exception
>> of ppn.
>>
>> For example: -l nodes=vmem=10G will just never run.
>>
>>   
>>> I am going to modify TORQUE so it will process these resources more
>>> like we expect.
>>>      
>> If you wait one week I will provide full patch for generic resources.
>>
>>    
> Can you send us a syntax interpretation of what will be in your patch?

My patch just add generic resources on nodes (including ncpus as a
resource).

Nodespec syntax remains the same and works as expected
(at least as I would expect):

-l nodes=2:mem=10G:ncpus=4:network=mirinet#matlab_licenses=1

"Give me two nodes, on each: 10G memory, and 4 cpus, both on mirinet
network. And also give me one matlab license."

(actually my current implementation would give you two matlab licenses)

All resources are calculated on the server. Torque FIFO scheduler sends
run requests with full nodespec (scheduled node name included).

Because of this, the server doesn't have much to think about, so he
simply verifies the request (some nodes could died, or someone [another
scheduler/qrun] might have run something).

Resources reported by the node, can be autoset on the server. This is
controlled with two server attributes. One selects resources that should
be taken from mom, second one allows resource renaming ( for example:
totmem->vmem).

Any resource set in the nodes file / or using qmgr superseeds resource
taken from node. For example you have a node that reports 4 cpus, and
you generally want to take this value from the node, but now you want to
do some maintenance and need two cpus on the node for that, you can
simple set the cpus resource and when done unset it (and the value will
be taken again from the node).

-- 
Mgr. Šimon Tóth

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3366 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20100609/697df6ee/attachment-0001.bin 


More information about the torquedev mailing list