[Mauiusers] Job not running on secondary partition
Matthew Britt
msbritt at umich.edu
Thu Dec 2 11:40:56 MST 2004
From: msbritt at umich.edu
Subject: Job not being running on secondary partition
Date: December 1, 2004 11:41:58 PM EST
To: mauiusers at supercluster.org
We have a problem with jobs not starting up in a secondary partition,
using maui-3.2.6p9 and PBSPro 5.4.0. After the primary partition is
full, then next job (or set of jobs - seems based on RESERVATIONDEPTH)
will have a reservation created for it, even though there are plenty of
resources available in the secondary partition. Any subsequent jobs
will run in the secondary partition w/o delay. We've tested this
setting RESERVATIONDEPTH to 0, 1 and 2, which results in 0, 1 or 2 jobs
being scheduled to run via reservation, rather than running
immediately.
Is there a method/configuration which will automatically run the job in
the secondary partition?
Here are the configs:
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
RESERVATIONDEPTH 1
# Try to make jobs run on one processor type, if possible
NODEALLOCATIONPOLICY MINRESOURCE
SYSCFG PLIST=
CLASSCFG[staff] PLIST=STAFF:GENERAL PDEF=STAFF
NODECFG[node001m] MAXJOB=1 PROCSPEED=2600 PARTITION=GENERAL
NODECFG[node002m] MAXJOB=1 PROCSPEED=2600 PARTITION=GENERAL
NODECFG[node003m] MAXJOB=1 PROCSPEED=1600 PARTITION=STAFF # This is
the only node in the STAFF partition
NODECFG[node004m] MAXJOB=1 PROCSPEED=2600 PARTITION=GENERAL
NODECFG[node005m] MAXJOB=1 PROCSPEED=1600 PARTITION=GENERAL
NODECFG[node006m] MAXJOB=1 PROCSPEED=2600 PARTITION=GENERAL
Here's the checkjob output:
*******first job - runs in primary partition *********
State: Running
Creds: user:msbritt group:users class:staff qos:DEFAULT
WallTime: 00:00:02 of 00:10:00
SubmitTime: Wed Dec 1 23:30:06
(Time Queued Total: 00:00:01 Eligible: 00:00:01)
StartTime: Wed Dec 1 23:30:07
Total Tasks: 1
Req[0] TaskCount: 1 Partition: STAFF
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
NodeCount: 1
Allocated Nodes:
[node003m:1]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 1
PartitionMask: [GENERAL][STAFF]
Flags: RESTARTABLE
Reservation '129683' (-00:00:02 -> 00:09:58 Duration: 00:10:00)
PE: 1.00 StartPriority: 1
**************second job - the one that gets "stuck" ***************
checking job 129684
State: Idle
Creds: user:msbritt group:users class:staff qos:DEFAULT
WallTime: 00:00:00 of 00:10:00
SubmitTime: Wed Dec 1 23:30:07
(Time Queued Total: 00:01:02 Eligible: 00:01:02)
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
NodeCount: 1
IWD: [NONE] Executable: [NONE]
Bypass: 1 StartCount: 0
PartitionMask: [GENERAL][STAFF]
Flags: RESTARTABLE
Reservation '129684' (00:00:00 -> 00:10:00 Duration: 00:10:00)
PE: 1.00 StartPriority: 1
job can run in partition GENERAL (9 procs available. 1 procs required)
job cannot run in partition STAFF (insufficient idle procs available: 0
< 1)
**************third job - runs in the secondary
partition*******************
checking job 129685
State: Running
Creds: user:msbritt group:users class:staff qos:DEFAULT
WallTime: 00:01:41 of 00:10:00
SubmitTime: Wed Dec 1 23:30:08
(Time Queued Total: 00:00:01 Eligible: 00:00:01)
StartTime: Wed Dec 1 23:30:09
Total Tasks: 1
Req[0] TaskCount: 1 Partition: GENERAL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
NodeCount: 1
Allocated Nodes:
[node065m:1]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 1
PartitionMask: [GENERAL][STAFF]
Flags: BACKFILL RESTARTABLE
Reservation '129685' (-00:01:31 -> 00:08:29 Duration: 00:10:00)
PE: 1.00 StartPriority: 1
Thanks for any help!
- matt
More information about the mauiusers
mailing list