[Mauiusers] job preemption - can this be done?

Lennart Karlsson Lennart.Karlsson at nsc.liu.se
Tue Feb 17 09:06:32 MST 2009


Hi Bill,

You wrote:
> Running maui/torque on a number of clusters.  Have never used the 
> preemtion stuff before but upon searching through the documentation what 
> I'd like to do might not be able to be done here.
> 
> I'd like 4 classes/queues.  A preempts B prempts C  and D preempts C.
> 
> C is the general purpose totally premptable class.  Both D and B can get 
> their bits they need by preempting jobs in this class.  And I believe 
> that this would be all fine and good except for this A class which needs 
> full access to the machine (less D class jobs) NOW!
> 
> 
> I've looked through the Maui docs as well as the moab ones to no avail. 
>     Preliminary searching leads me to SGE and LSF.  But it sure would be 
> nice to make this something maui could handle.
> 
> Anyone know of a way or something I may be missing?  I guess I could 
> kill jobs out of a prologue script for A class jobs until I freed enough 
> cores.  I don't know.

If I understand your problem correctly, you can solve it like this:

# Short defer times, but allow many deferrals
DEFERSTARTCOUNT         3
DEFERTIME               0:00:50
DEFERCOUNT              500

# Put priority on QOS only.
# The zeroes are there, so b jobs do not preempt b jobs,
# and so c jobs do not preempt c jobs.
QOSWEIGHT               1
QUEUETIMEWEIGHT         0
XFACTORWEIGHT           0

# Requeue jobs. There are other policies, but I have not tried them
PREEMPTIONPOLICY        REQUEUE

# Define the queues
CLASSCFG[a]      	QDEF=acos
CLASSCFG[b]      	QDEF=bcos
CLASSCFG[c]      	QDEF=ccos
CLASSCFG[d]     	QDEF=dcos

QOSCFG[acos]          	PRIORITY=500000 QFLAGS=PREEMPTOR
QOSCFG[bcos]          	PRIORITY=300000 QFLAGS=PREEMPTOR,PREEMPTEE
QOSCFG[ccos]          	PRIORITY=100000 QFLAGS=PREEMPTOR,PREEMPTEE

QOSCFG[dcos]          	PRIORITY=200000 QFLAGS=PREEMPTOR


The solution is not perfect, because of a Maui bug: When
Maui has sent the Requeue order to Torque, it does not
wait for requeue completion and immediately asks for
the new job to run on the nodes that perhaps not yet
are evacuated. (Moab has the same problem, but tries
to go around it by retrying the job start many times.)

As far as I know, CRI has not yet fixed the bug. Myself,
I have fixed it in a straightforward way, by putting in a sleep
statement after the Requeue request. Other members of
this list may have more beautiful fixes...

Best regards,
-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
   National Supercomputer Centre in Linkoping, Sweden
   http://www.nsc.liu.se




More information about the mauiusers mailing list