[torqueusers] Scheduling advice

Baker D.J. D.J.Baker at soton.ac.uk
Thu Apr 6 09:18:43 MDT 2006


Hi,

Our cluster has recently got far more complex, and we have put some
interim scheduling policies in place until we can work out something
better. I wonder if there are cluster administrators in the community
who could advice us, please. I suspect others have similar set ups in
place.

Basically we have one large torque/maui controlled cluster consisting
of...

1. Single core nodes -- all departments
2. Dual core nodes -- all departments 
3. Departmental nodes -- 4 nodes for chemistry (dual cores), 8 nodes for
eScience (dual cores), 5 single core nodes for Sound/Vibration.

The nodes in (1) are older, and all users can use them by default and
access is trivial. For the new nodes (2 and 3) we have devised a simple
scheme to control access based on switch boundaries. For example, for
nodes in (2), we have...

NODECFG[purple301] FEATURES=switch10
...
NODECFG[purple332] FEATURES=switch11
..

Switches 10, and 11 aren't defined in the maui NODESETLIST, and so users
must specify the appropriate switch(es) on their qsub command. Above all
we want to ensure that jobs don't ever grab a mix of nodes from (1), and
(2). Clunky, but works.

For nodes in (3), again we have followed the same "switch" idea however
have also defined a standing reservation to limit user access. Also, of
course, users can ensure that their jobs can spill over into the main
facility by doing something like:

qsub -W x=NODESET:ONEOF:FEATURE:switch10:switch11:escience ...

Above all I think this scheme is clunky, and could be improved upon(?)
-- we are writing a script to hide the details, however. Could any one
with more experience of setting large systems please advise us by
suggesting possible set ups based on queues, partitions, etc. An
interesting question comes to mind...in a torque/maui system it is
possible for queued jobs to migrate from one queue to another if
resources are busy

Thanks -- David.




More information about the torqueusers mailing list