[Mauiusers] Re: Suspended jobs resume execution

David Corredor tecnico at nsstc.uah.edu
Fri Apr 21 12:24:20 MDT 2006


Just to comment on the issue of priorities with suspended jobs. I have noticed that 
my problem getting suspended jobs to resume boils down to the following:

   - Jobs only continue to increase their priority when they are in the IDLE queue, but
 those suspended jobs are still in the RUN queue, and their priority stays fixed to the
 same value over time.
   - I have seen this very clearly in the logs. 
   - So, what happens is that the jobs in the IDLE queue eventually get a higher
 priority of the job that is suspended. The suspended job should ideally restart
 after the preemptor job finishes, but since the other job in the IDLE queue already 
 has a higher priority, that other job gets an an automatic reservation for the nodes 
 once they are free and they "preempt" the suspended job once again. And this happens
 regardless of whether this new job has the preemptor tag or not. 


 I've changed many settings and I think I have one working (posted at the end). I had been
testing this configuration for several days, and I noticed that my second problem was
also the test jobs I was using to troubleshoot this issue:

   - I found out that if the preemptor job runs and finishes in less than 30 seconds, the 
 suspended job cannot resume because of an invalid start time (it's start time is set to the
 future) and it gets jumped.
   - If the preemptor job runs for over 30 seconds, then it's all good. Except that short (<30 sec)
 jobs are not uncommon (like users testing out new binaries that may crash right away). And
 so, if that happens the user who submitted the long job and got preempted, is now out of 
 luck.

  Here is my configuration. I believe it is working (limited testing so far), the only problem 
is the 30 second short preemptor jobs.

--------------------
  from  maui.cfg
--------------------
RMPOLLINTERVAL        00:00:30
SERVERPORT            42559
SERVERMODE            NORMAL
BACKFILLPOLICY        BESTFIT
PREEMPTPOLICY     SUSPEND
WCVIOLATIONACTION PREEMPT
RESERVATIONPOLICY  NEVER
CREDWEIGHT            1
USERWEIGHT            0
GROUPWEIGHT           0
XFACTORWEIGHT         0
QOSWEIGHT             1
CLASSWEIGHT           1
RESWEIGHT             1
QUEUETIMEWEIGHT       0
JOBPRIOACCRUALPOLICY  FULLPOLICY
JOBNODEMATCHPOLICY EXACTPROC
NODEALLOCATIONPOLICY      PRIORITY
NODEAVAILABILITYPOLICY    UTILIZED
NODEACCESSPOLICY          SHARED
NODECFG[default] PRIORITYF='APROCS - LOAD + 0.01 * AMEM + 0.1 * ASWAP'
CLASSCFG[verylong]  QDEF=verylong
CLASSCFG[long]      QDEF=long
CLASSCFG[fast]      QDEF=fast
QOSCFG[verylong]  QFLAGS=PREEMPTEE            PRIORITY=5
QOSCFG[long]      QFLAGS=PREEMPTEE:PREEMPTOR  PRIORITY=10
QOSCFG[fast]      QFLAGS=PREEMPTOR            PRIORITY=1000

---------------------------------------------------------------

David




On Thursday 20 April 2006 01:00 pm, mauiusers-request at supercluster.org wrote:
> Message: 1
> Date: Wed, 19 Apr 2006 23:02:34 -0700
> From: James Wigdahl <james at wigdahl.com>
> Subject: Re: [Mauiusers] Suspended jobs resume execution
> To: "Ronny T. Lampert" <telecaadmin at uni.de>
> Cc: mauiusers at supercluster.org
> Message-ID: <FFD7FB23-C5B1-4975-A9C7-2B21AF4AF4A9 at wigdahl.com>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
> 
> 
> On Apr 19, 2006, at 6:31 AM, Ronny T. Lampert wrote:
> 
> > QOSCFG[short]           PRIORITY=100 QFLAGS=PREEMPTOR
> > QOSCFG[default]         PRIORITY=500 QFLAGS=PREEMPTEE
> 
> Learn and use Maui's "diagnose -p". Your 'default' jobs always have  
> higher priority than your 'short' jobs and will therefore always be  
> favored. Flip the priorities here and you should see things start to  
> work as you'd like.
> 


More information about the mauiusers mailing list