[Mauiusers] Re: Suspended jobs resume execution

Josh Butikofer josh at clusterresources.com
Fri Aug 18 10:02:46 MDT 2006


Robin,

Some of our deadlines have past and I was able to take time today to look at this suspension problem
in more detail. I have found that the solution is not just a simple fix in the code, but a
combination of settings and changes.

The first issue I investigated was why the suspended job's run-priority was not growing over time;
in other words, why the job was not "aging." In order to ensure the job's run-priority would grow,
even in a suspended state, I implemented a new job priority weight factor called
USAGEEXECUTIONTIMEWEIGHT. This, like other USAGE sub-component factors, is only applied to active
jobs and only works if the USAGEWEIGHT is set to something other than 0. A positive
EXECUTIONTIMEWEIGHT will cause jobs that have a start time (including suspended jobs, as they were
once started), to increase in run priority over time. With these settings the job should properly age.

In my testing, I also found that an internal Maui attribute named the "suspension min time" could 
sometimes get in the way of resuming the suspended job. This attribute's purpose is to prevent Maui 
from suspending and resuming and then suspending the same job within the same iteration. (It 
prevents rapid "flipping" of jobs.) A job will not resume after being suspended until after this min 
time has passed. Maui starts counting immediately after the job is suspended/resumed. This attribute 
was set to 60 seconds and if the PREEMPTOR job finished before this time, then the suspended job 
would not resume because the min time had not yet been satisfied. Even with a growing priority this 
min time could prevent jobs from being resumed. In order to help alleviate the chances of this 
happening often, I decreased the "suspension min time" to 10 seconds.

The last way that this issue can exhibit itself is when an advanced reservation is blocking the 
suspended job's ability to resume. This happens only if the PREEMPTOR job's wallclock limit is 
less-than or equal-to the suspended job's wallclock limit.

For example, if we have two jobs in the queue with the same priority, A_low and B_low, and B_low was 
submitted second, then let's say A_low starts and takes up the nodes needed by B_low. So B_low is 
now in the Idle queue, but creates a reservation in the future so it can guarantee to run after 
A_low is complete. Next a PREEMPTOR job, C_high, comes in with a higher priority and suspends A_low 
so that C_high can run. The advanced reservation that B_low has will now be adjusted to fit "around" 
the new wallclock limit of C_high. If C_high runs shorter than A_low does, then B_low's advanced 
reservation will move backward in time. If C_high ends, and A_low tries to resume it won't be able 
too, because B_low's advanced reservation will be overlapping A_low's run-length. If, however, 
C_high was longer than A_low's wallclock, then A_low can still squeeze in before B_low's reservation 
begins.

Perhaps the example was a little much, but I hope you get the idea. In Maui there is currently only 
one way to get around this: controlling the creation of advanced reservations. Depending on the 
needs of your cluster, you can disable advanced reservations altogether by using:

RESERVATIONPOLICY NEVER

in your maui.cfg. If this suspension problem really hurts the utilization of your cluster, than this 
solution may work best for your site. Otherwise, it may be a little overkill.

In Moab Workload Manager you can enable lower priority reservations to be "preempted" as well, 
allowing for A_low to run no matter where B_low's reservation begins. Adding this feature to Maui 
would, unfortunately, be quite the extensive effort and I don't foresee us being able to implement 
it anytime soon.

All of the above changes have been included in the most recent development snapshot available at 
http://www.clusterresources.com/downloads/maui/.

Let me know if you experience any problems or have any more questions. We appreciate the continuing 
support from the Maui community and their active participation in resolving bugs and creating 
enhancements.

-- 
Joshua Butikofer
Cluster Resources, Inc.

josh at clusterresources.com
Voice: (801) 717-3707
Fax:   (801) 717-3738
--------------------------


Robin Humble wrote:
> Hi,
> 
> On Thu, Apr 27, 2006 at 10:32:40AM -0600, Josh Butikofer wrote:
>> We've confirmed that this behavior is happening in Maui. Moab Workload 
>> Manager currently has the desired behavior with suspended jobs accruing 
>> priority (and also correctly handles different classes involved). We 
>> hope that over the next few weeks we will be able to make these 
>> improvements in Maui as well. We will keep the list posted on our progress.
> 
> any updates?
> 
> in case you were looking for a simpler test case, the below 2 queue
> system seems to have the same behaviour as the previous bug report -
> ie. the suspended PREEMPTEE job has a hard time resuming.
> 
> in other words after a PREEMPTOR job steams through (correctly) we end
> up with a previously queued PREEMPTEE job then being chosen to run over
> the top of the suspended PREEMPTEE job.
> 
> I don't think this is correct behaviour as only PREEMPTOR jobs should
> be able to run over the top of PREEMPTEE jobs.
> 
> versions are:
> torque 2.1.1-3 (rebuild on AS4 i686 from the fc5 .src.rpm), maui 3.2.6p16
> 
> relevant part of maui.cfg:
> 
> PREEMPTPOLICY SUSPEND
> CLASSCFG[debug]      QDEF=high
> CLASSCFG[workq]      QDEF=low
> QOSCFG[high] PRIORITY=500 QFLAGS=PREEMPTOR
> QOSCFG[low] PRIORITY=100 QFLAGS=PREEMPTEE
> QOSWEIGHT       1
> 
> cheers,
> robin
> 
>> -- 
>> Joshua Butikofer
>> Cluster Resources, Inc.
>>
>> josh at clusterresources.com
>> (801) 798-7488
>> --------------------------
>>
>>
>> David Corredor wrote:
>>> The problem is not just that the suspended job gets once again preempted
>>> by a job of its same class from the IDLE queue, this happens regardless
>>> of the class of the new job.
>>>
>>>  Ex.  3 queues (1 verylong, 1 long, 1 fast.  Fast preempts long and
>>> verylong, and long preempts verylong, verylong should not preempt).
>>>    - Submit 1 long job so that it takes all resources in cluster.
>>>    - Submit a verylong job so that it waits in the IDLE queue.
>>>    - Submit a fast job.
>>>
>>>  The fast job preempts the long one, and once it finishes, instead of the
>>> long one to resume execution, the verylong kicks in and preempts it once
>>> again (and it shouldn't).
>>>
>>>
>>>
>>>
>>>
>>> <quote who="Ronny T. Lampert">
>>>
>>>> .....
>>>> However I experience the very same problem as you do (I need the
>>>> QUEUETIMEWEIGHT set to 1) - the preempted ones stay suspended and instead
>>>> a
>>>> NEW job from the batch queue is started :-(
>>>>
>>>> I think this is a bug: suspended jobs *should age*, too.
>>>> Or automatically get a slightly higher priority than the highest in the
>>>> same
>>>> class to prevent it from staying suspended and interrupted by jobs from
>>>> the
>>>> same class.
>>>>
>>>> Could some developer shortly comment on that issue?
>>>>
>>>> Thanks!
>>>> Ronny
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> mauiusers mailing list
>>> mauiusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>> _______________________________________________
>> mauiusers mailing list
>> mauiusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers



More information about the mauiusers mailing list