[Mauiusers] Maui not scheduling valid jobs when nodes are available

Prakash Velayutham prakash.velayutham at cchmc.org
Fri Dec 8 10:27:55 MST 2006


Hello,

I am a recent Maui user (using Torque scheduler before). I have
Maui-3.2.6-13 with Torque-2.1.6. I have this same setup in 2 different
clusters. In both the clusters, the Torque server/Maui scheduler (both
runs on the same server in the 2 setups) is on a 32-bit SuSE 9.3 server.

In one of the setups, everything is working flawlessly.

In the other cluster, I am able to submit jobs like "qsub -l nodes=1
cpuload.sh".
But if I change the resource list to something like "qsub -l
nodes=1:opteron:ppn=2 cpuload.sh", maui does not schedule this job.

Here is some output from the Maui logs.
##############################################################################################
12/08 10:24:08 MPBSJobLoad(39158,39158.x.y.z,J,TaskList,0)
12/08 10:24:08 MReqCreate(39158,SrcRQ,DstRQ,DoCreate)
12/08 10:24:08 INFO:     processing node request line '1:opteron:ppn=2'
12/08 10:24:08 MJobSetCreds(39158,xxx,users,)
12/08 10:24:08 INFO:     default QOS for job 39158 set to DEFAULT(0)
(P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/08 10:24:08 INFO:     default QOS for job 39158 set to DEFAULT(0)
(P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/08 10:24:08 INFO:     default QOS for job 39158 set to DEFAULT(0)
(P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/08 10:24:08 INFO:     job '39158' loaded:   2   litert    users     
0       Idle   0 1165591448   [NONE] [NONE] [NONE] >=      0 >=      0
[opteron][xeon][1] 1165591448
12/08 10:24:08 INFO:     1 PBS jobs detected on RM FRUCTOSE
12/08 10:24:08 INFO:     jobs detected: 1
12/08 10:24:08 MStatClearUsage(node,Active)
12/08 10:24:08 MClusterUpdateNodeState()
12/08 10:24:08 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
12/08 10:24:08 ERROR:    job '39158' has NULL WCLimit field
12/08 10:24:08 INFO:     job '39158' Priority:        1
12/08 10:24:08 INFO:     Cred:      0(00.0)  FS:      0(00.0) 
Attr:      0(00.0)  Serv:      0(00.0)  Targ:      0(00.0)  Res:     
0(00.0)  Us:      0(00.0)
12/08 10:24:08 MStatClearUsage([NONE],Active)
12/08 10:24:08 INFO:     total jobs selected (ALL): 1/1
12/08 10:24:08 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
12/08 10:24:08 ERROR:    job '39158' has NULL WCLimit field
12/08 10:24:08 INFO:     job '39158' Priority:        1
12/08 10:24:08 INFO:     Cred:      0(00.0)  FS:      0(00.0) 
Attr:      0(00.0)  Serv:      0(00.0)  Targ:      0(00.0)  Res:     
0(00.0)  Us:      0(00.0)
12/08 10:24:08 MStatClearUsage([NONE],Idle)
12/08 10:24:08 INFO:     total jobs selected (ALL): 1/1
12/08 10:24:08
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
12/08 10:24:08 INFO:     total jobs selected in partition ALL: 1/1
12/08 10:24:08 MQueueScheduleRJobs(Q)
12/08 10:24:08
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
12/08 10:24:08 INFO:     total jobs selected in partition ALL: 1/1
12/08 10:24:08
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE)
12/08 10:24:08 INFO:     total jobs selected in partition DEFAULT: 1/1
12/08 10:24:08 MQueueScheduleIJobs(Q,DEFAULT)
12/08 10:24:08 INFO:     0 feasible tasks found for job 39158:0 in
partition DEFAULT (2 Needed)
12/08 10:24:08 MJobPReserve(39158,DEFAULT,ResCount,ResCountRej)
12/08 10:24:08 MJobReserve(39158,Priority)
12/08 10:24:08 INFO:     0 feasible tasks found for job 39158:0 in
partition DEFAULT (2 Needed)
12/08 10:24:08 ALERT:    job 39158 cannot run in any partition
12/08 10:24:08 ALERT:    cannot create new reservation for job 39158
(shape[1] 2)
12/08 10:24:08 ALERT:    cannot create new reservation for job 39158
12/08 10:24:08 MJobSetHold(39158,16,1:00:00,NoResources,cannot create
reservation for job '39158' (intital reservation attempt))
12/08 10:24:08 ALERT:    job '39158' cannot run (deferring job for 3600
seconds)
12/08 10:24:08 WARNING:  cannot reserve priority job '39158'
#################################################################################################################

Here is maui.cfg:
#################################################################################################################
# maui.cfg 3.2.6p13

SERVERHOST           fructose.cchmc.org
ADMIN1                root
RMCFG[X] TYPE=PBS HOST=x.y.z PORT=15001 EPORT=15003
AMCFG[bank]  TYPE=NONE
JOBNODEMATCHPOLICY      EXACTNODE
RMPOLLINTERVAL        00:00:30
SERVERPORT            42559
SERVERMODE            TEST
LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3
QUEUETIMEWEIGHT       1
BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST
NODEALLOCATIONPOLICY  MINRESOURCE
#################################################################################################################

Thanks for any help,
Prakash


More information about the mauiusers mailing list