A.1 Case Study: Mixed Parallel/Serial Homogeneous Cluster
A multi-user site wishes to control the distribution of compute cycles while minimizing job turnaround time and maximizing overall system utilization.
64 2 way SMP Linux based nodes, each with 512 MB of RAM and 16 GB
local scratch space
range in size from 1 to 32 processors with approximately the following
quartile job frequency distribution
Job Length: jobs range in length from 1 to 24 hours
Job Owners: job are submitted from 6 major groups consisting of a total of about 50 users
During prime time hours, the majority of jobs submitted are smaller, short
running development jobs where users are testing out new code and new data
sets. The owners of these jobs are often unable to proceed with their
work until a job they have submitted completes. Many of these jobs
are interactive in nature. Throughout the day, large, longer running
production workload is also submitted but these jobs do not have comparable
turnaround time pressure.
Constraints: (Must do)
The groups 'Meteorology' and 'Statistics'
should receive approximately 45 and 35% of the total delivered cycles respectively.
Nodes cannot be shared amongst tasks from different jobs.
Goals: (Should do)
The system should attempt to minimize turnaround
time during primetime hours (Mon - Fri, 8:00 AM to 5:00 PM) and maximize
system utilization during all other times. System maintenance
should be efficiently scheduled around
The network topology is flat and and nodes are homogeneous. This makes life significantly simpler. The focus for this site is controlling distribution of compute cycles without negatively impacting overall system turnaround and utilization. Currently, the best mechanism for doing this is Fairshare. This feature can be used to adjust the priority of jobs to favor/disfavor jobs based on fairshare targets and historical usage. In essence, this feature improves the turnaround time of the jobs not meeting their fairshare target at the expense of those that are. Depending on the criticality of the delivered cycle distribution constraints, this site might also wish to consider an allocations bank such as PNNL's QBank which enables more stringent control over the amount of resources which can be delivered to various users.
To manage the primetime job turnaround constraints, a standing reservation would probably be the best approach. A standing reservation can be used to set aside a subset of the nodes for quick turnaround jobs. This reservation can be configured with a time based access point to allow only jobs which will complete within some time X to utilize these resources. The reservation has advantages over a typical queue based solution in this case in that these quick turnaround jobs can be run anywhere resources are available, either inside, or outside the reservation, or even crossing reservation boundaries. The site does not have any hard constraints about what is acceptable turnaround time so the best approach would probably be to analyze the site's workload under a number of configurations using the simulator and observe the corresponding scheduling behavior.
For general optimization, there are a number of scheduling
aspects to consider, scheduling algorithm, reservation policies, node allocation
policies, and job prioritization. It is almost always a good idea
to utilize the scheduler's backfill
capability since this has a tendency to increase average system utilization
and decrease average turnaround time in a surprisingly fair manner.
It does tend to favor somewhat small and short jobs over others which is
exactly what this site desires. Reservation policies are often best
left alone unless rare starvation issues arise or quality of service policies
are desired. Node allocation policies are effectively meaningless
since the system is homogeneous. The final scheduling aspect, job
prioritization, can play a significant role in meeting site goals.
To maximize overall system utilization, maintaining a significant Resource
priority factor will favor large resource (processor) jobs, pushing them
to the front of the queue. Large jobs, though often only a small
portion of a site's job count, regularly account for the majority of a
site's delivered compute cycles. To minimize job turnaround, the
priority factor will favor short running jobs. Finally, in order
for fairshare to be effective, a significant Fairshare
priority factor must be included.
For this scenario, a resource manager configuration consisting of a single, global queue/class with no constraints would allow Maui the maximum flexibility and opportunities for optimization.
The following Maui configuration would be a good initial stab.
# prioritize jobs for Fairshare, XFactor, and Resources
# disable SMP node sharing
The command 'diagnose -f' will allow you to
monitor the effectiveness of the fairshare component of your job prioritization.
Adjusting the Fairshare priority factor up/or down will make fairshare
more/less effective. Note that a tradeoff must occur between fairshare
and other goals managed via job prioritization. 'diagnose -p' will
help you analyze the priority distributions of the currently idle jobs.
The 'showgrid AVGXFACTOR' command will provide a good indication of average
job turnaround while the 'profiler' command will give an excellent analysis
of longer term historical performance statistics.
Any priority configuration will need to be tuned
over time because the effect of priority weights is highly dependent upon
the site specific workload. Additionally, the priority weights themselves
are part of a feedback loop which adjust the site workload. However,
most sites quickly stabilize and significant priority tuning is unnecessary
after a few days.
|© 2001-2010 Adaptive Computing Enterprises, Inc.|