[Mauiusers] slurm + maui problem.

vesor at 163.com vesor at 163.com
Fri Dec 1 01:49:04 MST 2006


I use slurm1.1.19 and maui3.2.6p18.
I configured 'node10' with 2 processors and 'node7' with 4.
When use "srun -n6 -t 2 hostname", there is no problem.
But when use "srun -N2 -t 2 hostname", the job can't get enough resources to run.

maui.cfg:
# maui.cfg 3.2.6p18
SERVERHOST            node10
ADMIN1                root

RMCFG[node10] TYPE=WIKI
RMPORT            7321            # or whatever you choose as a port
RMHOST            node10
RMAUTHTYPE[node10]  NONE

PARTITIONMODE ON
NODECFG[node10]   PARTITION=test
NODECFG[node7]    PARTITION=test

AMCFG[bank]  TYPE=NONE
RMPOLLINTERVAL        00:00:15
SERVERPORT            42559
LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              7

QUEUETIMEWEIGHT       1 
BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST
NODEALLOCATIONPOLICY  MINRESOURCE

######################
[root at node10 root]# showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME


     0 Active Jobs       0 of    6 Processors Active (0.00%)
                         0 of    2 Nodes Active      (0.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME

14                     root       Idle     1    00:02:00  Fri Dec  1 15:20:48

1 Idle Job 

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


Total Jobs: 1   Active Jobs: 0   Idle Jobs: 1   Blocked Jobs: 0


[root at node10 root]# checkjob 14

checking job 14

State: Idle
Creds:  user:root  group:root  qos:DEFAULT
WallTime: 00:00:00 of 00:02:00
SubmitTime: Fri Dec  1 15:20:48
  (Time Queued  Total: 00:00:04  Eligible: 00:00:04)

StartDate: 00:00:01  Fri Dec  1 15:20:53
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 1M  Disk >= 1M  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
NodeCount: 2


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [test]
Reservation '14' (00:00:01 -> 00:02:01  Duration: 00:02:00)
PE:  1.00  StartPriority:  1
cannot select job 14 for partition test (startdate in '00:00:01')

############################
maui.log:

12/01 15:21:11 INFO:     nodelist[0] node10  2  6
12/01 15:21:11 INFO:     nodelist[1] node7  4  6
12/01 15:21:11 INFO:     ignoring pass 1 for job 14:0 (node set forced in feasible list)
12/01 15:21:11 INFO:     evaluating nodes on alloc iteration 0 for job 14:0
12/01 15:21:11 INFO:     evaluating nodes on alloc iteration 1 for job 14:0
12/01 15:21:11 INFO:     evaluating nodes on alloc iteration 2 for job 14:0
12/01 15:21:11 INFO:     evaluating nodes on alloc iteration 3 for job 14:0
12/01 15:21:11 INFO:     evaluating nodes on alloc iteration 4 for job 14:0
12/01 15:21:11 INFO:     evaluating nodes on alloc iteration 5 for job 14:0
12/01 15:21:11 INFO:     tasks located for job 14:  2 of 1 required (6 feasible)
12/01 15:21:11 INFO:     allocated MNode[000]x1 'node7' to 14:0
12/01 15:21:11 INFO:     allocated MNode[001]x1 'node10' to 14:0
12/01 15:21:11 MJobStart(14)
12/01 15:21:11 MJobDistributeTasks(14,node10,NodeList,TaskMap)
12/01 15:21:11 INFO:     0 node(s)/0 task(s) added to 14:0
12/01 15:21:11 ALERT:    inadequate tasks allocated to job
12/01 15:21:11 WARNING:  cannot distribute allocated tasks for job '14'
12/01 15:21:11 ERROR:    cannot start job '14' in partition test
12/01 15:21:11 MJobSetAttr(14,SysSMinTime,Value,0,3)
12/01 15:21:11 INFO:     system min start time set on job 14 for 00:00:01
12/01 15:21:11 MJobPReserve(14,test,ResCount,ResCountRej)
12/01 15:21:11 MJobReserve(14,Priority)
12/01 15:21:11 MPolicyGetEStartTime(14,ALL,SOFT,Time)
12/01 15:21:11 INFO:     policy start time found for job 14 in 00:00:01
12/01 15:21:11 MJobGetEStartTime(14,NULL,NodeCount,TaskCount,MNodeList,1164957672)
12/01 15:21:11 MParGetTC(test,Avl,Cfg,Ded,Req,2140000000)
12/01 15:21:11 MJobGetRange(14,RQ,test,00:00:01,GRange,NULL,NodeMap,1,TRange)
12/01 15:21:11 MReqGetFNL(14,0,test,NULL,DstNL,NC,TC,2140000000,0)
12/01 15:21:11 MReqCheckResourceMatch(14,0,node10,NULL)
12/01 15:21:11 INFO:     node node10 can provide resources for job 14:0
12/01 15:21:11 MNodeCheckPolicies(14,node10,2)
12/01 15:21:11 MJobCheckNRes(14,node10,RQ[0],  INFINITY,TCAvail,1.000,RIndex,NULL,FeasCheck)
12/01 15:21:11 MReqCheckResourceMatch(14,0,node10,RIndex)
12/01 15:21:11 INFO:     node node10 can provide resources for job 14:0
12/01 15:21:11 INFO:     node node10 added to feasible list (2 tasks)
12/01 15:21:11 MReqCheckResourceMatch(14,0,node7,NULL)




More information about the mauiusers mailing list