[Mauiusers] preemptor job holds up the rest of the system
Greenseid, Joseph M (IS)
Joseph.Greenseid at ngc.com
Tue Jan 26 12:40:49 MST 2010
What happens if you try changing your BACKFILLPOLICY to FIRSTFIT instead of BESTFIT (just to try one of the other backfill algorithms)?
Does it behave differently?
From: mauiusers-bounces at supercluster.org on behalf of Bill Wichser
Sent: Tue 1/26/2010 2:03 PM
To: Bill Wichser
Cc: mauiusers at supercluster.org
Subject: Re: [Mauiusers] preemptor job holds up the rest of the system
Still haven't solved this problem. Again, here is the scenario:
A system reservation is in place for 10 days from now.
Users submit jobs for say 15 days and land in the top of the IDLE stack
Other jobs which are submited, say for 2 days or 4 hours, are sitting
behind the 15 day jobs waiting to run.
The jobs holding everything up were submitted to a CLASS with preemptor QOS.
Removing the system reservation allows everything to run. Manually
placing the long running jobs on HOLD allow the rest to run. My
BACKFILLDEPTH more than covers all the jobs in the IDLE state as
indicated by the asterisk next to job number.
Other maui configurations on other clusters, without this preemptor
stuff, work as expected. I'm out of clues here! The relevant stanza
from maui.cfg looks like this:
Any clues on what I might be missing?
Bill Wichser wrote:
> We've been running Maui/Torque for quite a few years here. On our
> latest cluster there has been a need to start using preemptive queues
> (classes). This has done fine except for a problem we had when setting
> a system reservation.
> So a system reservation was set across the entire cluster. Normally, on
> clusters without preemptive scheduling, jobs exceeding the wallclock
> time will block, allowing other shorter jobs to backfill in.
> When using a preemptive class, when a job exceeeds available wallclock
> time, it remains in the IDLE state preventing other backfill jobs from
> running. Yes, these other jobs are preemptee class.
> Manually placing the overruns on hold allows these other jobs to get
> Am I simply missing something in the configuration? Or is this the
> expected behavior when using preemptor/preemptee classes/QOS?
> mauiusers mailing list
> mauiusers at supercluster.org
mauiusers mailing list
mauiusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mauiusers