[torquedev] [Bug 67] Support for counted resources on nodes
David Singleton
David.Singleton at anu.edu.au
Mon Jul 5 06:55:08 MDT 2010
On 07/05/2010 10:33 PM, "Mgr. Šimon Tóth" wrote:
>> Not really. Its putting in the server what would normally be in the
>> scheduler. Schedulers should be parsing node resource information from
>> MOMs and ensuring job resource requests for each node do not exceed the
>> nodes resource capacity. Simon is putting a version of that logic in
>> the server. Then a simple qrun will "do the right thing" for all
>> resources.
>>
>> Since a certain amount of site policy can go into resource allocation,
>> I'm not sure this should go in the server.
>
> The general problem here is scalability. You really can't read resources
> from nodes if you have a medium/large cluster.
>
> And you definitely can't do this once you have nodes on different
> physical networks (network latency and connection breakdowns would just
> total kill the schedulers performance).
>
Yes, when I said "node resource information from MOMs" I meant via the
server. There's no need for the scheduler to contact every MOM - I agree
that's a scaling and reliability nightmare - it can just get the info
from the server.
David
More information about the torquedev
mailing list