[Mauiusers] preemptor job holds up the rest of the system

Bill Wichser bill at Princeton.EDU
Tue Jan 26 15:15:54 MST 2010


Nothing.  No change whatsoever.

Greenseid, Joseph M (IS) wrote:
> What happens if you try changing your BACKFILLPOLICY to FIRSTFIT instead 
> of BESTFIT (just to try one of the other backfill algorithms)?
>  
> Does it behave differently?
>  
> --Joe
> 
> ------------------------------------------------------------------------
> *From:* mauiusers-bounces at supercluster.org on behalf of Bill Wichser
> *Sent:* Tue 1/26/2010 2:03 PM
> *To:* Bill Wichser
> *Cc:* mauiusers at supercluster.org
> *Subject:* Re: [Mauiusers] preemptor job holds up the rest of the system
> 
> Still haven't solved this problem.  Again, here is the scenario:
> 
> A system reservation is in place for 10 days from now.
> 
> Users submit jobs for say 15 days and land in the top of the IDLE stack
> (showq)
> 
> Other jobs which are submited, say for 2 days or 4 hours, are sitting
> behind the 15 day jobs waiting to run.
> 
> The jobs holding everything up were submitted to a CLASS with preemptor QOS.
> 
> Removing the system reservation allows everything to run.  Manually
> placing the long running jobs on HOLD allow the rest to run.  My
> BACKFILLDEPTH more than covers all the jobs in the IDLE state as
> indicated by the asterisk next to job number.
> 
> Other maui configurations on other clusters, without this preemptor
> stuff, work as expected.  I'm out of clues here!  The relevant stanza
> from maui.cfg looks like this:
> 
> BACKFILLPOLICY        BESTFIT
> BACKFILLMETRIC          PROCSECONDS
> BACKFILLDEPTH         20
> RESERVATIONPOLICY     FIRSTFIT
> RESERVATIONDEPTH[0]     24
> RESDEPTH                24
> 
> Any clues on what I might be missing?
> 
> Thanks,
> Bill
> 
> 
> Bill Wichser wrote:
>  > We've been running Maui/Torque for quite a few years here.  On our
>  > latest cluster there has been a need to start using preemptive queues
>  > (classes).  This has done fine except for a problem we had when setting
>  > a system reservation.
>  >
>  > So a system reservation was set across the entire cluster.  Normally, on
>  >   clusters without preemptive scheduling, jobs exceeding the wallclock
>  > time will block, allowing other shorter jobs to backfill in.
>  >
>  > When using a preemptive class, when a job exceeeds available wallclock
>  > time, it remains in the IDLE state preventing other backfill jobs from
>  > running.  Yes, these other jobs are preemptee class.
>  >
>  > Manually placing the overruns on hold allows these other jobs to get
>  > scheduled.
>  >
>  > Am I simply missing something in the configuration?  Or is this the
>  > expected behavior when using preemptor/preemptee classes/QOS?
>  >
>  > Thanks,
>  > Bill
>  > _______________________________________________
>  > mauiusers mailing list
>  > mauiusers at supercluster.org
>  > http://www.supercluster.org/mailman/listinfo/mauiusers
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers


More information about the mauiusers mailing list