[Mauiusers] job not run because of swap utilization?

Catherine Pitt cen1001 at cam.ac.uk
Wed Aug 11 10:29:29 MDT 2004


Hi,

I've been using Maui 3.2.5p7 without any problems on one of my clusters
with OpenPBS 2.3.16 for about six months and am very happy with it. I
recently put it on another cluster, also with OpenPBS, and am having
trouble getting jobs to run on some nodes because Maui seems to think that
the nodes with free processors don't have enough free swap space. All the
nodes on both clusters are dual processor and most of the jobs are serial
so the nodes would normally run two jobs at once. Maui has no problem
starting a job on a completely idle node; it's just when there's already
one job there that I have trouble.

Can anyone help please? 

checkjob gives me:

$ checkjob  15647


checking job 15647

State: Idle  (User: vcb25  Group: vcb25)
WallTime: 0:00:00 of 2:08:00:00
SubmitTime: Wed Aug 11 15:11:25
  (Time Queued  Total: 2:04:13  Eligible: 2:03:00)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Class: [serial_med_1gb 1]  Features: [medium]


IWD: [NONE]  Executable:  [NONE]
QOS: DEFAULT  Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       RESTARTABLE

PE:  1.00  StartPriority:  226
job cannot run in partition DEFAULT (idle procs do not meet requirements :
0 of 1 procs found)
idle procs:  11  feasible procs:   0

Rejection Reasons: [Features     :    1][Swap         :    5][State
:   10]

so there are five free processors that could run this job. If I run
checknode on one of the nodes with a free processor I get: 

$ checknode node06


checking node node06

State:   Running  (in current state for 0:00:00)
Configured Resources: PROCS: 2  MEM: 1  SWAP: 10  DISK: 1
Utilized   Resources: SWAP: 2
Dedicated  Resources: PROCS: 1
Opsys:       DEFAULT  Arch:      [NONE]
Speed:      1.00  Load:       0.000
Network:    [DEFAULT]
Features:   [1][medium][long]
Classes:    [serial_long_1gb 2:2][serial_long_2gb 2:2][par_2proc_med
2:2][par_4proc_med 2:2][par_8proc_med 2:2][serial_med_1gb
1:2][serial_test_1gb 2:2][serial_med_2gb 2:2][serial_test
2:2][serial_test_2gb 2:2]

Total Time: 30:10:54:41  Up: 30:10:54:41 (100.00%)  Active: 29:20:36:17
(98.04%)

Reservations:
Job '15631'(x1)  -12:25:34 -> 1:19:34:26 (2:08:00:00)
JobList:  15631
ALERT:  node has 1 procs dedicated but load is low (0.000)

The SWAP and MEM totals are obviously wrong, but they're wrong on my other
cluster too and I've never had this problem there. I always assumed that
was down to a bug in PBS. The job shouldn't be requesting swap or memory,
only a single processor, and as far as I can tell that's what it has
requested. 

I tried setting 

NODEAVAILABILITYPOLICY  DEDICATED

in the hope that this would make Maui ignore the 'used' swap and just go
by free processors, but it doesn't seem to have helped. I also tried 

NODEAVAILABILITYPOLICY  DEDICATED DEDICATED:SWAP

but that doesn't seem to have made a difference. 

Suggestions gratefully received!

Catherine

Dr Catherine Pitt      Computer Officer, Department of Chemistry
cen1001 at cam.ac.uk      University of Cambridge





More information about the mauiusers mailing list