[Mauiusers] Bug in Maui preemption algorithm
Kevin Hildebrand
kevin at umd.edu
Fri Oct 24 08:32:30 MDT 2008
Hello, I've discovered what I guess should be classified as a bug in the
Maui preemption algorithm.
If a collection of preemptible jobs are running such that they do not use
all available processors on a given node, and another higher-priority
preemptor comes along, either the higher priority job will be blocked, or
will preempt too many of the running jobs.
For instance, in a cluster of 16 nodes with 4gb RAM and np=4, say there
are 16 single-processor preemptible jobs running (nodes=1,ppn=1,mem=4gb).
Because each job requires 4gb of RAM, only one job runs on each of the 16
nodes.
Now, if you attempt to schedule a job, say (nodes=8,ppn=4) this should in
theory preempt 8 of the running jobs, and then run on the 8 nodes freed
up. However, what in reality is happening is the job is being blocked.
The logic in MJobSelectPJobList() counts up the preemptible jobs and the
CPU counts they're actually using (in this case 16) and compares that
against the needed number of CPUs (in this case 32) and then fails because
32>16.
In reality, the code should probably include idle CPUs in the count, so
that the preemption candidates would indicate 64 available CPUs instead of
16, and the job would then preempt and run properly.
This leads to the second problem- because the preemption logic only
counts one CPU per preemptible job, if someone submits a smaller job (say
nodes=2,ppn=4, which should only require the preemption of two jobs),
instead maui evicts 8 jobs, runs the large job, and then restarts the
remaining 6 jobs. This causes a bunch of unnecessary restarts.
I've been looking at ideas on how to fix this, but haven't come up with
anything yet, I'm hoping someone with more familiarity with the code might
be able to suggest something.
Thanks,
Kevin Hildebrand
University of Maryland, College Park
More information about the mauiusers
mailing list