[Mauiusers] Scheduling advice
troy at osc.edu
Thu Apr 6 10:01:03 MDT 2006
On Thu, 2006-04-06 at 16:18 +0100, Baker D.J. wrote:
> Our cluster has recently got far more complex, and we have put some
> interim scheduling policies in place until we can work out something
> better. I wonder if there are cluster administrators in the community
> who could advice us, please. I suspect others have similar set ups in
> Basically we have one large torque/maui controlled cluster consisting
> 1. Single core nodes -- all departments
> 2. Dual core nodes -- all departments
> 3. Departmental nodes -- 4 nodes for chemistry (dual cores), 8 nodes for
> eScience (dual cores), 5 single core nodes for Sound/Vibration.
> The nodes in (1) are older, and all users can use them by default and
> access is trivial. For the new nodes (2 and 3) we have devised a simple
> scheme to control access based on switch boundaries. For example, for
> nodes in (2), we have...
> NODECFG[purple301] FEATURES=switch10
> NODECFG[purple332] FEATURES=switch11
> Switches 10, and 11 aren't defined in the maui NODESETLIST, and so users
> must specify the appropriate switch(es) on their qsub command. Above all
> we want to ensure that jobs don't ever grab a mix of nodes from (1), and
> (2). Clunky, but works.
> For nodes in (3), again we have followed the same "switch" idea however
> have also defined a standing reservation to limit user access. Also, of
> course, users can ensure that their jobs can spill over into the main
> facility by doing something like:
> qsub -W x=NODESET:ONEOF:FEATURE:switch10:switch11:escience ...
> Above all I think this scheme is clunky, and could be improved upon(?)
> -- we are writing a script to hide the details, however. Could any one
> with more experience of setting large systems please advise us by
> suggesting possible set ups based on queues, partitions, etc. An
> interesting question comes to mind...in a torque/maui system it is
> possible for queued jobs to migrate from one queue to another if
> resources are busy
There are multiple ways to do this. In the past, I've used a
combination of partitions and standing reservations to segregate jobs
onto different types of nodes based on their class/queue. For instance,
on our Pentium 4 cluster we have 112 nodes with Infiniband hardware
(partition "parallel" in Maui/Moab) and 144 nodes without (partition
"serial"). There are corresponding queues in PBS/TORQUE that are routed
to by a default queue called "batch"; users are encouraged not to
specify a queue, but rather to specify the resources they need.
Standing reservations then enforce access control on the nodes in the
partitions to particular queues/classes of jobs, and the partitions keep
jobs from spanning across multiple classes of nodes.
In your case, it sounds like your fundamental access control is on the
department level. If everyone in a department is in the same UNIX
group, you can key off of that; however, a more likely scenario is that
UNIX groups are set on research group boundaries, in which case you'll
want to construct accounts (a sort of meta-group in Maui) for each
# partitions -- dualcore, chem, escience & sndvib
# Single-core nodes won't have a partition ID set and will end up in
# the DEFAULT partition.
# Note that you will need to increase MMAX_MPAR to 6 in maui.h to allow
# 4 user-defined partitions (plus ALL and DEFAULT).
I'm a little surprised that the Maui docs  don't list ADEF as a valid
parameter to GROUPCFG, as I'm pretty I used it in Maui before we
converted over to using Moab.
Hope this helps,
Troy Baer troy at osc.edu
Science & Technology Support http://www.osc.edu/hpc/
Ohio Supercomputer Center 614-292-9701
More information about the mauiusers