[Mauiusers] preemptor job holds up the rest of the system
Bill Wichser
bill at Princeton.EDU
Tue Jan 26 15:15:54 MST 2010
Nothing. No change whatsoever.
Greenseid, Joseph M (IS) wrote:
> What happens if you try changing your BACKFILLPOLICY to FIRSTFIT instead
> of BESTFIT (just to try one of the other backfill algorithms)?
>
> Does it behave differently?
>
> --Joe
>
> ------------------------------------------------------------------------
> *From:* mauiusers-bounces at supercluster.org on behalf of Bill Wichser
> *Sent:* Tue 1/26/2010 2:03 PM
> *To:* Bill Wichser
> *Cc:* mauiusers at supercluster.org
> *Subject:* Re: [Mauiusers] preemptor job holds up the rest of the system
>
> Still haven't solved this problem. Again, here is the scenario:
>
> A system reservation is in place for 10 days from now.
>
> Users submit jobs for say 15 days and land in the top of the IDLE stack
> (showq)
>
> Other jobs which are submited, say for 2 days or 4 hours, are sitting
> behind the 15 day jobs waiting to run.
>
> The jobs holding everything up were submitted to a CLASS with preemptor QOS.
>
> Removing the system reservation allows everything to run. Manually
> placing the long running jobs on HOLD allow the rest to run. My
> BACKFILLDEPTH more than covers all the jobs in the IDLE state as
> indicated by the asterisk next to job number.
>
> Other maui configurations on other clusters, without this preemptor
> stuff, work as expected. I'm out of clues here! The relevant stanza
> from maui.cfg looks like this:
>
> BACKFILLPOLICY BESTFIT
> BACKFILLMETRIC PROCSECONDS
> BACKFILLDEPTH 20
> RESERVATIONPOLICY FIRSTFIT
> RESERVATIONDEPTH[0] 24
> RESDEPTH 24
>
> Any clues on what I might be missing?
>
> Thanks,
> Bill
>
>
> Bill Wichser wrote:
> > We've been running Maui/Torque for quite a few years here. On our
> > latest cluster there has been a need to start using preemptive queues
> > (classes). This has done fine except for a problem we had when setting
> > a system reservation.
> >
> > So a system reservation was set across the entire cluster. Normally, on
> > clusters without preemptive scheduling, jobs exceeding the wallclock
> > time will block, allowing other shorter jobs to backfill in.
> >
> > When using a preemptive class, when a job exceeeds available wallclock
> > time, it remains in the IDLE state preventing other backfill jobs from
> > running. Yes, these other jobs are preemptee class.
> >
> > Manually placing the overruns on hold allows these other jobs to get
> > scheduled.
> >
> > Am I simply missing something in the configuration? Or is this the
> > expected behavior when using preemptor/preemptee classes/QOS?
> >
> > Thanks,
> > Bill
> > _______________________________________________
> > mauiusers mailing list
> > mauiusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/mauiusers
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
More information about the mauiusers
mailing list