[torquedev] [Bug 67] Support for counted resources on nodes

David Singleton David.Singleton at anu.edu.au
Mon Jul 5 06:55:08 MDT 2010


On 07/05/2010 10:33 PM, "Mgr. Šimon Tóth" wrote:
>> Not really.  Its putting in the server what would normally be in the
>> scheduler.  Schedulers should be parsing node resource information from
>> MOMs and ensuring job resource requests for each node do not exceed the
>> nodes resource capacity.  Simon is putting a version of that logic in
>> the server.  Then a simple qrun will "do the right thing" for all
>> resources.
>>
>> Since a certain amount of site policy can go into resource allocation,
>> I'm not sure this should go in the server.
>
> The general problem here is scalability. You really can't read resources
> from nodes if you have a medium/large cluster.
>
> And you definitely can't do this once you have nodes on different
> physical networks (network latency and connection breakdowns would just
> total kill the schedulers performance).
>

Yes, when I said "node resource information from MOMs" I meant via the
server.  There's no need for the scheduler to contact every MOM - I agree
that's a scaling and reliability nightmare - it can just get the info
from the server.

David


More information about the torquedev mailing list