[Mauiusers] Re: Suspended jobs resume execution
tecnico at nsstc.uah.edu
Fri Apr 21 12:24:20 MDT 2006
Just to comment on the issue of priorities with suspended jobs. I have noticed that
my problem getting suspended jobs to resume boils down to the following:
- Jobs only continue to increase their priority when they are in the IDLE queue, but
those suspended jobs are still in the RUN queue, and their priority stays fixed to the
same value over time.
- I have seen this very clearly in the logs.
- So, what happens is that the jobs in the IDLE queue eventually get a higher
priority of the job that is suspended. The suspended job should ideally restart
after the preemptor job finishes, but since the other job in the IDLE queue already
has a higher priority, that other job gets an an automatic reservation for the nodes
once they are free and they "preempt" the suspended job once again. And this happens
regardless of whether this new job has the preemptor tag or not.
I've changed many settings and I think I have one working (posted at the end). I had been
testing this configuration for several days, and I noticed that my second problem was
also the test jobs I was using to troubleshoot this issue:
- I found out that if the preemptor job runs and finishes in less than 30 seconds, the
suspended job cannot resume because of an invalid start time (it's start time is set to the
future) and it gets jumped.
- If the preemptor job runs for over 30 seconds, then it's all good. Except that short (<30 sec)
jobs are not uncommon (like users testing out new binaries that may crash right away). And
so, if that happens the user who submitted the long job and got preempted, is now out of
Here is my configuration. I believe it is working (limited testing so far), the only problem
is the 30 second short preemptor jobs.
NODECFG[default] PRIORITYF='APROCS - LOAD + 0.01 * AMEM + 0.1 * ASWAP'
QOSCFG[verylong] QFLAGS=PREEMPTEE PRIORITY=5
QOSCFG[long] QFLAGS=PREEMPTEE:PREEMPTOR PRIORITY=10
QOSCFG[fast] QFLAGS=PREEMPTOR PRIORITY=1000
On Thursday 20 April 2006 01:00 pm, mauiusers-request at supercluster.org wrote:
> Message: 1
> Date: Wed, 19 Apr 2006 23:02:34 -0700
> From: James Wigdahl <james at wigdahl.com>
> Subject: Re: [Mauiusers] Suspended jobs resume execution
> To: "Ronny T. Lampert" <telecaadmin at uni.de>
> Cc: mauiusers at supercluster.org
> Message-ID: <FFD7FB23-C5B1-4975-A9C7-2B21AF4AF4A9 at wigdahl.com>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
> On Apr 19, 2006, at 6:31 AM, Ronny T. Lampert wrote:
> > QOSCFG[short] PRIORITY=100 QFLAGS=PREEMPTOR
> > QOSCFG[default] PRIORITY=500 QFLAGS=PREEMPTEE
> Learn and use Maui's "diagnose -p". Your 'default' jobs always have
> higher priority than your 'short' jobs and will therefore always be
> favored. Flip the priorities here and you should see things start to
> work as you'd like.
More information about the mauiusers