[Mauiusers] Job not running on secondary partition

Matthew Britt msbritt at umich.edu
Thu Dec 2 11:40:56 MST 2004


	From: 	  msbritt at umich.edu
	Subject: 	Job not being running on secondary partition
	Date: 	December 1, 2004 11:41:58 PM EST
	To: 	  mauiusers at supercluster.org

We have a problem with jobs not starting up in a secondary partition, 
using maui-3.2.6p9 and PBSPro 5.4.0.  After the primary partition is 
full, then next job (or set of jobs - seems based on RESERVATIONDEPTH) 
will have a reservation created for it, even though there are plenty of 
resources available in the secondary partition.  Any subsequent jobs 
will run in the secondary partition w/o delay.   We've tested this 
setting RESERVATIONDEPTH to 0, 1 and 2, which results in 0, 1 or 2 jobs 
being scheduled to run via reservation, rather than running 
immediately.

Is there a method/configuration which will automatically run the job in 
the secondary partition?

Here are the configs:

BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST
RESERVATIONDEPTH       1

# Try to make jobs run on one processor type, if possible
NODEALLOCATIONPOLICY  MINRESOURCE

SYSCFG                  PLIST=

CLASSCFG[staff] PLIST=STAFF:GENERAL PDEF=STAFF


NODECFG[node001m] MAXJOB=1 PROCSPEED=2600 PARTITION=GENERAL
NODECFG[node002m] MAXJOB=1 PROCSPEED=2600 PARTITION=GENERAL
NODECFG[node003m] MAXJOB=1 PROCSPEED=1600 PARTITION=STAFF   # This is 
the only node in the STAFF partition
NODECFG[node004m] MAXJOB=1 PROCSPEED=2600 PARTITION=GENERAL
NODECFG[node005m] MAXJOB=1 PROCSPEED=1600 PARTITION=GENERAL
NODECFG[node006m] MAXJOB=1 PROCSPEED=2600 PARTITION=GENERAL




Here's the checkjob output:
*******first job - runs in primary partition *********
State: Running
Creds:  user:msbritt  group:users  class:staff  qos:DEFAULT
WallTime: 00:00:02 of 00:10:00
SubmitTime: Wed Dec  1 23:30:06
   (Time Queued  Total: 00:00:01  Eligible: 00:00:01)

StartTime: Wed Dec  1 23:30:07
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: STAFF
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
NodeCount: 1
Allocated Nodes:
[node003m:1]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [GENERAL][STAFF]
Flags:       RESTARTABLE

Reservation '129683' (-00:00:02 -> 00:09:58  Duration: 00:10:00)
PE:  1.00  StartPriority:  1

**************second job - the one that gets "stuck" ***************
checking job 129684

State: Idle
Creds:  user:msbritt  group:users  class:staff  qos:DEFAULT
WallTime: 00:00:00 of 00:10:00
SubmitTime: Wed Dec  1 23:30:07
   (Time Queued  Total: 00:01:02  Eligible: 00:01:02)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
NodeCount: 1


IWD: [NONE]  Executable:  [NONE]
Bypass: 1  StartCount: 0
PartitionMask: [GENERAL][STAFF]
Flags:       RESTARTABLE

Reservation '129684' (00:00:00 -> 00:10:00  Duration: 00:10:00)
PE:  1.00  StartPriority:  1
job can run in partition GENERAL (9 procs available.  1 procs required)
job cannot run in partition STAFF (insufficient idle procs available: 0 
< 1)

**************third job - runs in the secondary 
partition*******************
checking job 129685

State: Running
Creds:  user:msbritt  group:users  class:staff  qos:DEFAULT
WallTime: 00:01:41 of 00:10:00
SubmitTime: Wed Dec  1 23:30:08
   (Time Queued  Total: 00:00:01  Eligible: 00:00:01)

StartTime: Wed Dec  1 23:30:09
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: GENERAL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
NodeCount: 1
Allocated Nodes:
[node065m:1]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [GENERAL][STAFF]
Flags:       BACKFILL RESTARTABLE

Reservation '129685' (-00:01:31 -> 00:08:29  Duration: 00:10:00)
PE:  1.00  StartPriority:  1


Thanks for any help!

  -  matt



More information about the mauiusers mailing list