[Mauiusers] Preempted (suspended) job not restarting when it should.

David Corredor tecnico at nsstc.uah.edu
Fri Mar 31 17:08:20 MST 2006


This is what I'm trying now, it works for me though I really need to narrow 
down if these changes are all I needed to do, because I changed many things 
while troubleshouting

changeparam RESDEPTH 1
changeparam RESERVATIONPOLICY NEVER
changeparam RESWEIGHT 0

This will effectively kill the reservations feature, I don't use it, so it 
works for me.

The behavior I have now is that as jobs are submitted, they go straight to the 
"Blocked" queue, the preemptor jobs go briefly to the "Idle" queue while the 
running job that will be preempted is "Suspending." Once the preemptee job is 
suspended, the preemptor starts executing and the preempted job goes back to 
execution when the preemptor finishes. Just what I wanted.

David C.



On Friday 31 March 2006 03:50 pm, you wrote:
> On Mar 15, 2006, at 8:55 AM, David Corredor wrote:
> >   I'm trying to setup some basic preemtption with a "suspend"
> > policy whithin
> >   Maui. The preemption part is working, except that the job that gets
> >   preempted (suspended) doesn't restart execution until after all
> > other jobs
> >   in the Idle queue are finished executing, even if those jobs
> > don't have the
> >   preemtor flag set, and as far as I can tell, those jobs don't
> > have a higher
> >   priority nor xfactor than the suspended job either.
> >
> >   By looking at the logs, it seems to me that while the first job was
> > suspended, and the preemptor was running, the next idle job in the
> > queue
> > (with same prioriy as the suspended one), was reserved the node next
> > somehow, and so when the suspended job is supposed to restart, it
> > doesn't
> > find an available node.
> >
> >   I would appreciate any hints in this regard.
>
> I've been suffering with the same issue and was led to believe that
> adding the following to my config would fix things:
>
> FSPOLICY UTILIZEDPS
> CONSUMEDWEIGHT         3
>
> However, have not found this to resolve anything. Here is some live
> output from 'diagnose -p'  which I've edited only showing suspended
> jobs:
>
> # ./diagnose -p
> diagnosing job priority information (partition: ALL)
>
> Job                    PRIORITY*   Cred(  QOS:Class)  Serv(QTime)
> Targ(QTime)   Res(Cons )
>               Weights   --------       5(    2:    8)     1
> (    1)     1(    1)     1(    3)
>
> 8469                       3547    19.7( 10.0: 15.0)  80.3(2847.)
> 0.0(  0.0)   0.0(  0.0)
> 8968                       2073    43.4( 10.0: 20.0)  56.6(1173.)
> 0.0(  0.0)   0.0(  0.0)
> 8969                       2073    43.4( 10.0: 20.0)  56.6(1173.)
> 0.0(  0.0)   0.0(  0.0)
> 8970                       2073    43.4( 10.0: 20.0)  56.6(1173.)
> 0.0(  0.0)   0.0(  0.0)
> 8971                       2073    43.4( 10.0: 20.0)  56.6(1173.)
> 0.0(  0.0)   0.0(  0.0)
> 8972                       2073    43.4( 10.0: 20.0)  56.6(1173.)
> 0.0(  0.0)   0.0(  0.0)
>
>
> I would think with the parameters mentioned above enabled in my
> maui.cfg that there should be some kind of value listed in the "Res
> (Cons )" column adding to a job's priority. If this were happening,
> then suspended jobs, which have already consumed CPU time, should
> acquire additional priority points at a higher rate than idle jobs
> that were submitted at the same time, thereby meaning they would
> (hopefully) be resumed before an idle job in the queue was started. I
> have not found this to be the case.
>
> Anyone info on how to solve this would be MUCH appreciated...
>
>
> Just for kicks... here's my entire maui.cfg:
>
>
> SERVERHOST node001.cluster
> SERVERPORT 42559
> SERVERMODE NORMAL
>
> ADMIN1 root
>
> LOGFILE maui.log
> LOGFILEMAXSIZE 10000000
> LOGLEVEL 3
>
> RMCFG[base] TYPE=PBS TIMEOUT=90
> RMPOLLINTERVAL 00:00:10
>
> BACKFILLPOLICY FIRSTFIT
> NODEALLOCATIONPOLICY MINRESOURCE
> NODEACCESSPOLICY SHARED
> PREEMPTPOLICY SUSPEND
> RESERVATIONPOLICY NEVER
> FSPOLICY UTILIZEDPS
>
> DEFERTIME 1:00
> DEFERCOUNT 999
> DEFERSTARTCOUNT 10
>
> CREDWEIGHT             5
> CLASSWEIGHT            8
> QOSWEIGHT              2
> QUEUETIMEWEIGHT        1
> TARGETQUEUETIMEWEIGHT  1
> CONSUMEDWEIGHT         3
>
> QOSCFG[lopri] PRIORITY=10 QFLAGS=PREEMPTEE FLAGS=PREEMPTEE
> JOBFLAGS=PREEMPTEE
> QOSCFG[hipri] PRIORITY=10000 QFLAGS=PREEMPTOR FLAGS=PREEMPTOR
> JOBFLAGS=PREEMPTOR
>
> CLASSCFG[long-loprio]   QDEF=lopri MAXMEM=1200 MAXJOBPERUSER=30
> CLASSCFG[long]          QDEF=lopri MAXMEM=1200
> CLASSCFG[short]         QDEF=lopri
> CLASSCFG[interact]      QDEF=hipri
> CLASSCFG[swbuild]       QDEF=hipri
>
> NODECFG[DEFAULT] MAXJOB=5 MAXLOAD=3
>
> USERCFG[DEFAULT] QTTARGET=0:00:01 QLIST=lopri,hipri


More information about the mauiusers mailing list