[Mauiusers] Maui not scheduling valid jobs when nodes are available

Josh Butikofer josh at clusterresources.com
Thu Dec 14 07:22:57 MST 2006


Prakash,

It is not clear to me from this log file why the job's reservation cannot be made. Do you have any
existing reservations on the system? (Use showres to see.) Also, can you increase the loglevel to
see if the logs give more details? (Increase your LOGLEVEL setting to 6 or 7, restart Maui, and try
the test case again.)

Regards,

-- 
Joshua Butikofer
Cluster Resources, Inc.

josh at clusterresources.com
Voice: (801) 717-3707
Fax:   (801) 717-3738
--------------------------


Prakash Velayutham wrote:
> Hello,
> 
> I am a recent Maui user (using Torque scheduler before). I have
> Maui-3.2.6-13 with Torque-2.1.6. I have this same setup in 2 different
> clusters. In both the clusters, the Torque server/Maui scheduler (both
> runs on the same server in the 2 setups) is on a 32-bit SuSE 9.3 server.
> 
> In one of the setups, everything is working flawlessly.
> 
> In the other cluster, I am able to submit jobs like "qsub -l nodes=1
> cpuload.sh".
> But if I change the resource list to something like "qsub -l
> nodes=1:opteron:ppn=2 cpuload.sh", maui does not schedule this job.
> 
> Here is some output from the Maui logs.
> ##############################################################################################
> 12/08 10:24:08 MPBSJobLoad(39158,39158.x.y.z,J,TaskList,0)
> 12/08 10:24:08 MReqCreate(39158,SrcRQ,DstRQ,DoCreate)
> 12/08 10:24:08 INFO:     processing node request line '1:opteron:ppn=2'
> 12/08 10:24:08 MJobSetCreds(39158,xxx,users,)
> 12/08 10:24:08 INFO:     default QOS for job 39158 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/08 10:24:08 INFO:     default QOS for job 39158 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/08 10:24:08 INFO:     default QOS for job 39158 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/08 10:24:08 INFO:     job '39158' loaded:   2   litert    users     
> 0       Idle   0 1165591448   [NONE] [NONE] [NONE] >=      0 >=      0
> [opteron][xeon][1] 1165591448
> 12/08 10:24:08 INFO:     1 PBS jobs detected on RM FRUCTOSE
> 12/08 10:24:08 INFO:     jobs detected: 1
> 12/08 10:24:08 MStatClearUsage(node,Active)
> 12/08 10:24:08 MClusterUpdateNodeState()
> 12/08 10:24:08 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
> 12/08 10:24:08 ERROR:    job '39158' has NULL WCLimit field
> 12/08 10:24:08 INFO:     job '39158' Priority:        1
> 12/08 10:24:08 INFO:     Cred:      0(00.0)  FS:      0(00.0) 
> Attr:      0(00.0)  Serv:      0(00.0)  Targ:      0(00.0)  Res:     
> 0(00.0)  Us:      0(00.0)
> 12/08 10:24:08 MStatClearUsage([NONE],Active)
> 12/08 10:24:08 INFO:     total jobs selected (ALL): 1/1
> 12/08 10:24:08 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
> 12/08 10:24:08 ERROR:    job '39158' has NULL WCLimit field
> 12/08 10:24:08 INFO:     job '39158' Priority:        1
> 12/08 10:24:08 INFO:     Cred:      0(00.0)  FS:      0(00.0) 
> Attr:      0(00.0)  Serv:      0(00.0)  Targ:      0(00.0)  Res:     
> 0(00.0)  Us:      0(00.0)
> 12/08 10:24:08 MStatClearUsage([NONE],Idle)
> 12/08 10:24:08 INFO:     total jobs selected (ALL): 1/1
> 12/08 10:24:08
> MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
> 12/08 10:24:08 INFO:     total jobs selected in partition ALL: 1/1
> 12/08 10:24:08 MQueueScheduleRJobs(Q)
> 12/08 10:24:08
> MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
> 12/08 10:24:08 INFO:     total jobs selected in partition ALL: 1/1
> 12/08 10:24:08
> MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE)
> 12/08 10:24:08 INFO:     total jobs selected in partition DEFAULT: 1/1
> 12/08 10:24:08 MQueueScheduleIJobs(Q,DEFAULT)
> 12/08 10:24:08 INFO:     0 feasible tasks found for job 39158:0 in
> partition DEFAULT (2 Needed)
> 12/08 10:24:08 MJobPReserve(39158,DEFAULT,ResCount,ResCountRej)
> 12/08 10:24:08 MJobReserve(39158,Priority)
> 12/08 10:24:08 INFO:     0 feasible tasks found for job 39158:0 in
> partition DEFAULT (2 Needed)
> 12/08 10:24:08 ALERT:    job 39158 cannot run in any partition
> 12/08 10:24:08 ALERT:    cannot create new reservation for job 39158
> (shape[1] 2)
> 12/08 10:24:08 ALERT:    cannot create new reservation for job 39158
> 12/08 10:24:08 MJobSetHold(39158,16,1:00:00,NoResources,cannot create
> reservation for job '39158' (intital reservation attempt))
> 12/08 10:24:08 ALERT:    job '39158' cannot run (deferring job for 3600
> seconds)
> 12/08 10:24:08 WARNING:  cannot reserve priority job '39158'
> #################################################################################################################
> 
> Here is maui.cfg:
> #################################################################################################################
> # maui.cfg 3.2.6p13
> 
> SERVERHOST           fructose.cchmc.org
> ADMIN1                root
> RMCFG[X] TYPE=PBS HOST=x.y.z PORT=15001 EPORT=15003
> AMCFG[bank]  TYPE=NONE
> JOBNODEMATCHPOLICY      EXACTNODE
> RMPOLLINTERVAL        00:00:30
> SERVERPORT            42559
> SERVERMODE            TEST
> LOGFILE               maui.log
> LOGFILEMAXSIZE        10000000
> LOGLEVEL              3
> QUEUETIMEWEIGHT       1
> BACKFILLPOLICY        FIRSTFIT
> RESERVATIONPOLICY     CURRENTHIGHEST
> NODEALLOCATIONPOLICY  MINRESOURCE
> #################################################################################################################
> 
> Thanks for any help,
> Prakash
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers


More information about the mauiusers mailing list