[Mauiusers] Preempted (suspended) job not restarting when it should.

David Corredor tecnico at nsstc.uah.edu
Wed Mar 15 09:55:17 MST 2006


Hello everyone,

  I'm trying to setup some basic preemtption with a "suspend" policy whithin 
  Maui. The preemption part is working, except that the job that gets 
  preempted (suspended) doesn't restart execution until after all other jobs 
  in the Idle queue are finished executing, even if those jobs don't have the 
  preemtor flag set, and as far as I can tell, those jobs don't have a higher
  priority nor xfactor than the suspended job either. 
  
  By looking at the logs, it seems to me that while the first job was 
suspended, and the preemptor was running, the next idle job in the queue
(with same prioriy as the suspended one), was reserved the node next 
somehow, and so when the suspended job is supposed to restart, it doesn't
find an available node.
  
  I would appreciate any hints in this regard.

 
  Thanks.

    David


 1. Background
 2. Relevant maui.log info
 3. maui.cfg

***Background****

- Simple test (1 master and 1 node)
- Master is not in the execution loop (no pbs_mom)
- Node has 4 processors, and all jobs require 4 processors
- Job 38 Preemptor (fast queue)
- Jobs 30 and 31 are Preemtees  (long queue)
- Job 30 was started and it was in execution when 
 job 38 was submitted and preempted the running job.

  When job 38 finished, job 30 (which was suspended),
should have restarted execution, and job 31 should
wait on the idle queue. But instead, job 31 was
scheduled to start and it preempted job 30, so,
job 30 remains in suspended mode.


****Relevant maui.log info.*****

#### Job 38 just finished
INFO:     active PBS job 38 has been removed from the queue.  assuming 
successful completion
MJobProcessCompleted(38)
.
.
INFO:     job usage sent for job '38'
MJobRemove(38)
MResDestroy(38)
MResChargeAllocation(38,2)
MJobDestroy(38)


#### Job 30 had been preempted by 38 and so it's in suspend mode
#### but it should run now that 38 finished and the rest of
#### of the jobs in the queue are not preemptors.

MClusterUpdateNodeState()
MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
INFO:     job '30' Priority:       17
INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:      0(00.0)  Serv:    
17(00.0)  Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.

INFO:     job '31' Priority:       17
INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:      0(00.0)  Serv:     
17(00.0)  Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.


MStatClearUsage([NONE],Idle)
INFO:     total jobs selected (ALL): 6/6
MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
INFO:     job '30' Priority:       17
INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:      0(00.0)  Serv:     
17(00.0)  Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)
INFO:     job '31' Priority:       17
INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:      0(00.0)  Serv:     
17(00.0)  Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)


####   Why does job 30 not have adequate tasks or nodes found if the
#### node is free ?, and why does that same node get assigned to
#### job 31 ??

MStatClearUsage([NONE],Idle)
INFO:     total jobs selected (ALL): 6/6
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
INFO:     total jobs selected in partition ALL: 6/6
INFO:     4 feasible tasks found for job 30:0 in partition DEFAULT (4 Needed)
INFO:     inadequate feasible tasks found for job 30:0 (0 < 4)
INFO:     inadequate nodes found for job 30:0 (0 < 1)
MQueueScheduleRJobs(Q)
MResDestroy(31)
MResChargeAllocation(31,2)
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
INFO:     total jobs selected in partition ALL: 6/6
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE)
INFO:     total jobs selected in partition DEFAULT: 6/6
MQueueScheduleIJobs(Q,DEFAULT)
INFO:     4 feasible tasks found for job 31:0 in partition DEFAULT (4 Needed)
INFO:     tasks located for job 31:  4 of 4 required (4 feasible)
MJobStart(31)


***** MAUI.CFG ****
PREEMPTPOLICY   SUSPEND
QUEUETIMEWEIGHT  1
CREDWEIGHT            1
USERWEIGHT            1
GROUPWEIGHT     1
XFACTORWEIGHT   1
QOSWEIGHT       1
JOBPRIOACCRUALPOLICY    FULLPOLICY
XFACTORCAP      10000
XFMINWCLIMIT    0:01:00
CLASSCFG[long]      QDEF=long
CLASSCFG[fast]      QDEF=fast

QOSCFG[long]        QFLAGS=PREEMPTEE            PRIORITY=10
QOSCFG[fast]        QFLAGS=PREEMPTOR            PRIORITY=1000

NODEALLOCATIONPOLICY  MINRESOURCE
BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST


More information about the mauiusers mailing list