[Mauiusers] job preemption - can this be done?
Lennart.Karlsson at nsc.liu.se
Tue Feb 17 09:06:32 MST 2009
> Running maui/torque on a number of clusters. Have never used the
> preemtion stuff before but upon searching through the documentation what
> I'd like to do might not be able to be done here.
> I'd like 4 classes/queues. A preempts B prempts C and D preempts C.
> C is the general purpose totally premptable class. Both D and B can get
> their bits they need by preempting jobs in this class. And I believe
> that this would be all fine and good except for this A class which needs
> full access to the machine (less D class jobs) NOW!
> I've looked through the Maui docs as well as the moab ones to no avail.
> Preliminary searching leads me to SGE and LSF. But it sure would be
> nice to make this something maui could handle.
> Anyone know of a way or something I may be missing? I guess I could
> kill jobs out of a prologue script for A class jobs until I freed enough
> cores. I don't know.
If I understand your problem correctly, you can solve it like this:
# Short defer times, but allow many deferrals
# Put priority on QOS only.
# The zeroes are there, so b jobs do not preempt b jobs,
# and so c jobs do not preempt c jobs.
# Requeue jobs. There are other policies, but I have not tried them
# Define the queues
QOSCFG[acos] PRIORITY=500000 QFLAGS=PREEMPTOR
QOSCFG[bcos] PRIORITY=300000 QFLAGS=PREEMPTOR,PREEMPTEE
QOSCFG[ccos] PRIORITY=100000 QFLAGS=PREEMPTOR,PREEMPTEE
QOSCFG[dcos] PRIORITY=200000 QFLAGS=PREEMPTOR
The solution is not perfect, because of a Maui bug: When
Maui has sent the Requeue order to Torque, it does not
wait for requeue completion and immediately asks for
the new job to run on the nodes that perhaps not yet
are evacuated. (Moab has the same problem, but tries
to go around it by retrying the job start many times.)
As far as I know, CRI has not yet fixed the bug. Myself,
I have fixed it in a straightforward way, by putting in a sleep
statement after the Requeue request. Other members of
this list may have more beautiful fixes...
-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
National Supercomputer Centre in Linkoping, Sweden
More information about the mauiusers