[torquedev] [Bug 67] Support for counted resources on nodes

"Mgr. Šimon Tóth" SimonT at mail.muni.cz
Mon Jul 5 06:33:21 MDT 2010


> Not really.  Its putting in the server what would normally be in the
> scheduler.  Schedulers should be parsing node resource information from
> MOMs and ensuring job resource requests for each node do not exceed the
> nodes resource capacity.  Simon is putting a version of that logic in
> the server.  Then a simple qrun will "do the right thing" for all
> resources.
> 
> Since a certain amount of site policy can go into resource allocation,
> I'm not sure this should go in the server.

The general problem here is scalability. You really can't read resources
from nodes if you have a medium/large cluster.

And you definitely can't do this once you have nodes on different
physical networks (network latency and connection breakdowns would just
total kill the schedulers performance).

The third reason is probably only our local issue. You can't read
resources from nodes (actually you can, but you need the exact same
logic mirrored on server) if you have multiple schedulers per server.
The server has to be the authority here and has to verify each run
request to prevent race conditions.

Plus having resources on server makes a lot of sense. Nodes should not
be authoritative in reporting resources. Admins should have the ability
to set any type of resource in the nodes file.

This actually goes even beyond the standard resources as you can set
pretty much anything as a resource. Licenses, graphics cards, special HW
attached to certain nodes, etc...

-- 
Mgr. Šimon Tóth

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3366 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20100705/368c4642/attachment.bin 


More information about the torquedev mailing list