[Mauiusers] job not run because of swap utilization?
Catherine Pitt
cen1001 at cam.ac.uk
Wed Aug 11 10:29:29 MDT 2004
Hi,
I've been using Maui 3.2.5p7 without any problems on one of my clusters
with OpenPBS 2.3.16 for about six months and am very happy with it. I
recently put it on another cluster, also with OpenPBS, and am having
trouble getting jobs to run on some nodes because Maui seems to think that
the nodes with free processors don't have enough free swap space. All the
nodes on both clusters are dual processor and most of the jobs are serial
so the nodes would normally run two jobs at once. Maui has no problem
starting a job on a completely idle node; it's just when there's already
one job there that I have trouble.
Can anyone help please?
checkjob gives me:
$ checkjob 15647
checking job 15647
State: Idle (User: vcb25 Group: vcb25)
WallTime: 0:00:00 of 2:08:00:00
SubmitTime: Wed Aug 11 15:11:25
(Time Queued Total: 2:04:13 Eligible: 2:03:00)
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Class: [serial_med_1gb 1] Features: [medium]
IWD: [NONE] Executable: [NONE]
QOS: DEFAULT Bypass: 0 StartCount: 1
PartitionMask: [ALL]
Flags: RESTARTABLE
PE: 1.00 StartPriority: 226
job cannot run in partition DEFAULT (idle procs do not meet requirements :
0 of 1 procs found)
idle procs: 11 feasible procs: 0
Rejection Reasons: [Features : 1][Swap : 5][State
: 10]
so there are five free processors that could run this job. If I run
checknode on one of the nodes with a free processor I get:
$ checknode node06
checking node node06
State: Running (in current state for 0:00:00)
Configured Resources: PROCS: 2 MEM: 1 SWAP: 10 DISK: 1
Utilized Resources: SWAP: 2
Dedicated Resources: PROCS: 1
Opsys: DEFAULT Arch: [NONE]
Speed: 1.00 Load: 0.000
Network: [DEFAULT]
Features: [1][medium][long]
Classes: [serial_long_1gb 2:2][serial_long_2gb 2:2][par_2proc_med
2:2][par_4proc_med 2:2][par_8proc_med 2:2][serial_med_1gb
1:2][serial_test_1gb 2:2][serial_med_2gb 2:2][serial_test
2:2][serial_test_2gb 2:2]
Total Time: 30:10:54:41 Up: 30:10:54:41 (100.00%) Active: 29:20:36:17
(98.04%)
Reservations:
Job '15631'(x1) -12:25:34 -> 1:19:34:26 (2:08:00:00)
JobList: 15631
ALERT: node has 1 procs dedicated but load is low (0.000)
The SWAP and MEM totals are obviously wrong, but they're wrong on my other
cluster too and I've never had this problem there. I always assumed that
was down to a bug in PBS. The job shouldn't be requesting swap or memory,
only a single processor, and as far as I can tell that's what it has
requested.
I tried setting
NODEAVAILABILITYPOLICY DEDICATED
in the hope that this would make Maui ignore the 'used' swap and just go
by free processors, but it doesn't seem to have helped. I also tried
NODEAVAILABILITYPOLICY DEDICATED DEDICATED:SWAP
but that doesn't seem to have made a difference.
Suggestions gratefully received!
Catherine
Dr Catherine Pitt Computer Officer, Department of Chemistry
cen1001 at cam.ac.uk University of Cambridge
More information about the mauiusers
mailing list