[torquedev] [Bug 67] Support for counted resources on nodes

"Mgr. Šimon Tóth" SimonT at mail.muni.cz
Mon Jul 5 07:10:11 MDT 2010


>>> Not really.  Its putting in the server what would normally be in the
>>> scheduler.  Schedulers should be parsing node resource information from
>>> MOMs and ensuring job resource requests for each node do not exceed the
>>> nodes resource capacity.  Simon is putting a version of that logic in
>>> the server.  Then a simple qrun will "do the right thing" for all
>>> resources.
>>>
>>> Since a certain amount of site policy can go into resource allocation,
>>> I'm not sure this should go in the server.
>>
>> The general problem here is scalability. You really can't read resources
>> from nodes if you have a medium/large cluster.
>>
>> And you definitely can't do this once you have nodes on different
>> physical networks (network latency and connection breakdowns would just
>> total kill the schedulers performance).
>>
> 
> Yes, when I said "node resource information from MOMs" I meant via the
> server.  There's no need for the scheduler to contact every MOM - I agree
> that's a scaling and reliability nightmare - it can just get the info
> from the server.

Well contacting every node is the default pbs_sched approach :-)

I don't know every Torque feature, but I'm not aware about any
possibility to get total and used resources from server. Just the total
from the node status info.

-- 
Mgr. Šimon Tóth

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3366 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20100705/863bbf2d/attachment-0001.bin 


More information about the torquedev mailing list