[torquedev] [Bug 67] Support for counted resources on nodes
"Mgr. Šimon Tóth"
SimonT at mail.muni.cz
Mon Jul 5 07:10:11 MDT 2010
>>> Not really. Its putting in the server what would normally be in the
>>> scheduler. Schedulers should be parsing node resource information from
>>> MOMs and ensuring job resource requests for each node do not exceed the
>>> nodes resource capacity. Simon is putting a version of that logic in
>>> the server. Then a simple qrun will "do the right thing" for all
>>> resources.
>>>
>>> Since a certain amount of site policy can go into resource allocation,
>>> I'm not sure this should go in the server.
>>
>> The general problem here is scalability. You really can't read resources
>> from nodes if you have a medium/large cluster.
>>
>> And you definitely can't do this once you have nodes on different
>> physical networks (network latency and connection breakdowns would just
>> total kill the schedulers performance).
>>
>
> Yes, when I said "node resource information from MOMs" I meant via the
> server. There's no need for the scheduler to contact every MOM - I agree
> that's a scaling and reliability nightmare - it can just get the info
> from the server.
Well contacting every node is the default pbs_sched approach :-)
I don't know every Torque feature, but I'm not aware about any
possibility to get total and used resources from server. Just the total
from the node status info.
--
Mgr. Šimon Tóth
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3366 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20100705/863bbf2d/attachment-0001.bin
More information about the torquedev
mailing list