[Mauiusers] maui not scheduling even with avaliable resources

Arnau Bria arnau at emergetux.net
Thu Feb 28 04:47:54 MST 2008


Hi,

our maui server (maui-server-3.2.6p19_20.snap.1182974819-4.slc3) is not
scheduling fine. We have several queues and each one looks for special
wn resources. For example:

[root at pbs01 sbin]# qmgr -c "l q ifae"|grep resources_default.neednodes 
	resources_default.neednodes = ifae
[root at pbs01 sbin]# qmgr -c "l q gshort"|grep
resources_default.neednodes resources_default.neednodes = slc4

And we have no ifae WN free and many slc4 slots free:

# pbsnodes -a|grep -B2 ifae|grep -c free
0
# pbsnodes -a|grep -B2 slc4|grep -c free
46

So jobs to ifae are not able to run, but jobs to other queues should.


The queue looks like:
IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT
QUEUETIME

3862162              ops002       Idle     1  3:00:00:00  Thu Feb 28
12:09:05 3862186              ops002       Idle     1  3:00:00:00  Thu
Feb 28 12:11:11 3862201            dteam004       Idle     1
1:00:00:00  Thu Feb 28 12:12:36 3862202            dteam004
Idle     1  1:00:00:00  Thu Feb 28 12:12:38 3862203
dteam004       Idle     1  1:00:00:00  Thu Feb 28 12:13:28

If we check first job:
# checkjob 3862162


checking job 3862162

State: Idle
Creds:  user:ops002  group:ops  class:ifae  qos:DEFAULT
WallTime: 00:00:00 of 3:00:00:00
SubmitTime: Thu Feb 28 12:09:05
  (Time Queued  Total: 00:04:57  Eligible: 00:04:57)

StartDate: -00:04:02  Thu Feb 28 12:10:00
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [ifae]


IWD: [NONE]  Executable:  [NONE]
Bypass: 16  StartCount: 0
PartitionMask: [ALL]
Reservation '3862162' (2:14:18:42 -> 5:14:18:42  Duration: 3:00:00:00)
PE:  1.00  StartPriority:  10000
job cannot run in partition DEFAULT (idle procs do not meet
requirements : 0 of 1 procs found) idle procs: 203  feasible procs:   0

Rejection Reasons: [Features     :   63][State        :    6]

Goes to ifae, it's not able to run.


But the first job to gshort queue:

# checkjob 3862206

checking job 3862206

State: Idle
Creds:  user:dteam004  group:dteam  class:gshort  qos:DEFAULT
WallTime: 00:00:00 of 1:00:00:00
SubmitTime: Thu Feb 28 12:13:34
  (Time Queued  Total: 00:10:33  Eligible: 00:10:33)

StartDate: -00:10:05  Thu Feb 28 12:14:02
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [slc4]


IWD: [NONE]  Executable:  [NONE]
Bypass: 2  StartCount: 0
PartitionMask: [ALL]
PE:  1.00  StartPriority:  10000
job can run in partition DEFAULT (48 procs available.  1 procs required)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Should run as it has free slots, but keeps idle forever...


Our cluster looks like:

But it keeps in idle for long.
Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
gshort             --   12:00:00 24:00:00   --  115 106 --   E R
ifae               --   48:00:00 72:00:00   --   24  48 --   E R
                                               ----- -----
                                                 195   250

and running jobs number begins decreasing its value...


So, our solution is setting MAXPROC for ifae at maui.cfg. As we only
have 24 slots for ifae we set this limit:

CLASSCFG[ifae]          MAXPROC=24

restart maui, and tehn:

# qstat -q

server: pbs01.pic.es

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
gshort             --   12:00:00 24:00:00   --  163  58 --   E R
ifae               --   48:00:00 72:00:00   --   24  48 --   E R
                                               ----- -----
                                                 244   202


So, our question is, why doesn't maui schedule jobs even there are
available resources? Why when we set the MAXPROC limit maui starts
behaving fine ?

Feel free for asking any conf param I forgot to send...


TIA,
Arnau


More information about the mauiusers mailing list