[Mauiusers] Bug in Maui preemption algorithm

Kevin Hildebrand kevin at umd.edu
Fri Oct 24 08:32:30 MDT 2008


Hello, I've discovered what I guess should be classified as a bug in the 
Maui preemption algorithm.

If a collection of preemptible jobs are running such that they do not use 
all available processors on a given node, and another higher-priority 
preemptor comes along, either the higher priority job will be blocked, or 
will preempt too many of the running jobs.

For instance, in a cluster of 16 nodes with 4gb RAM and np=4, say there 
are 16 single-processor preemptible jobs running (nodes=1,ppn=1,mem=4gb).
Because each job requires 4gb of RAM, only one job runs on each of the 16 
nodes.

Now, if you attempt to schedule a job, say (nodes=8,ppn=4) this should in 
theory preempt 8 of the running jobs, and then run on the 8 nodes freed 
up.  However, what in reality is happening is the job is being blocked. 
The logic in MJobSelectPJobList() counts up the preemptible jobs and the 
CPU counts they're actually using (in this case 16) and compares that 
against the needed number of CPUs (in this case 32) and then fails because 
32>16.

In reality, the code should probably include idle CPUs in the count, so 
that the preemption candidates would indicate 64 available CPUs instead of 
16, and the job would then preempt and run properly.

This leads to the second problem- because the preemption logic only 
counts one CPU per preemptible job, if someone submits a smaller job (say 
nodes=2,ppn=4, which should only require the preemption of two jobs), 
instead maui evicts 8 jobs, runs the large job, and then restarts the 
remaining 6 jobs.  This causes a bunch of unnecessary restarts.

I've been looking at ideas on how to fix this, but haven't come up with 
anything yet, I'm hoping someone with more familiarity with the code might 
be able to suggest something.

Thanks,

Kevin Hildebrand
University of Maryland, College Park



More information about the mauiusers mailing list