[Mauiusers] Preempted (suspended) job not restarting when it should.

James Wigdahl james at wigdahl.com
Fri Mar 31 14:50:41 MST 2006


On Mar 15, 2006, at 8:55 AM, David Corredor wrote:

>   I'm trying to setup some basic preemtption with a "suspend"  
> policy whithin
>   Maui. The preemption part is working, except that the job that gets
>   preempted (suspended) doesn't restart execution until after all  
> other jobs
>   in the Idle queue are finished executing, even if those jobs  
> don't have the
>   preemtor flag set, and as far as I can tell, those jobs don't  
> have a higher
>   priority nor xfactor than the suspended job either.
>
>   By looking at the logs, it seems to me that while the first job was
> suspended, and the preemptor was running, the next idle job in the  
> queue
> (with same prioriy as the suspended one), was reserved the node next
> somehow, and so when the suspended job is supposed to restart, it  
> doesn't
> find an available node.
>
>   I would appreciate any hints in this regard.

I've been suffering with the same issue and was led to believe that  
adding the following to my config would fix things:

FSPOLICY UTILIZEDPS
CONSUMEDWEIGHT         3

However, have not found this to resolve anything. Here is some live  
output from 'diagnose -p'  which I've edited only showing suspended  
jobs:

# ./diagnose -p
diagnosing job priority information (partition: ALL)

Job                    PRIORITY*   Cred(  QOS:Class)  Serv(QTime)   
Targ(QTime)   Res(Cons )
              Weights   --------       5(    2:    8)     1 
(    1)     1(    1)     1(    3)

8469                       3547    19.7( 10.0: 15.0)  80.3(2847.)    
0.0(  0.0)   0.0(  0.0)
8968                       2073    43.4( 10.0: 20.0)  56.6(1173.)    
0.0(  0.0)   0.0(  0.0)
8969                       2073    43.4( 10.0: 20.0)  56.6(1173.)    
0.0(  0.0)   0.0(  0.0)
8970                       2073    43.4( 10.0: 20.0)  56.6(1173.)    
0.0(  0.0)   0.0(  0.0)
8971                       2073    43.4( 10.0: 20.0)  56.6(1173.)    
0.0(  0.0)   0.0(  0.0)
8972                       2073    43.4( 10.0: 20.0)  56.6(1173.)    
0.0(  0.0)   0.0(  0.0)


I would think with the parameters mentioned above enabled in my  
maui.cfg that there should be some kind of value listed in the "Res 
(Cons )" column adding to a job's priority. If this were happening,  
then suspended jobs, which have already consumed CPU time, should  
acquire additional priority points at a higher rate than idle jobs  
that were submitted at the same time, thereby meaning they would  
(hopefully) be resumed before an idle job in the queue was started. I  
have not found this to be the case.

Anyone info on how to solve this would be MUCH appreciated...


Just for kicks... here's my entire maui.cfg:


SERVERHOST node001.cluster
SERVERPORT 42559
SERVERMODE NORMAL

ADMIN1 root

LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3

RMCFG[base] TYPE=PBS TIMEOUT=90
RMPOLLINTERVAL 00:00:10

BACKFILLPOLICY FIRSTFIT
NODEALLOCATIONPOLICY MINRESOURCE
NODEACCESSPOLICY SHARED
PREEMPTPOLICY SUSPEND
RESERVATIONPOLICY NEVER
FSPOLICY UTILIZEDPS

DEFERTIME 1:00
DEFERCOUNT 999
DEFERSTARTCOUNT 10

CREDWEIGHT             5
CLASSWEIGHT            8
QOSWEIGHT              2
QUEUETIMEWEIGHT        1
TARGETQUEUETIMEWEIGHT  1
CONSUMEDWEIGHT         3

QOSCFG[lopri] PRIORITY=10 QFLAGS=PREEMPTEE FLAGS=PREEMPTEE  
JOBFLAGS=PREEMPTEE
QOSCFG[hipri] PRIORITY=10000 QFLAGS=PREEMPTOR FLAGS=PREEMPTOR  
JOBFLAGS=PREEMPTOR

CLASSCFG[long-loprio]   QDEF=lopri MAXMEM=1200 MAXJOBPERUSER=30
CLASSCFG[long]          QDEF=lopri MAXMEM=1200
CLASSCFG[short]         QDEF=lopri
CLASSCFG[interact]      QDEF=hipri
CLASSCFG[swbuild]       QDEF=hipri

NODECFG[DEFAULT] MAXJOB=5 MAXLOAD=3

USERCFG[DEFAULT] QTTARGET=0:00:01 QLIST=lopri,hipri



More information about the mauiusers mailing list