[Mauiusers] Having problems scheduling jobs with PPN specification on a cluster.

Jonathan K Shelley Jonathan.Shelley at inl.gov
Fri Feb 5 16:36:56 MST 2010


I tried that and it worked, allowing my jobs to schedule. However, my node 
file has only one line in it instead of 24 lines. I then found the 
JOBNODEMATCHPOLICY and removed it from my maui.cfg and then I could submit 
using -l nodes=24 which provided me with what I wanted. But when I try to 
do -l nodes=4:ppn=6 it won't schedule. From what I read it appears that 
this should work from torques liberal scheduling policy.

when I run checkjob it returns 

checking job 1098

State: Idle
Creds:  user:jon  group:jon  class:all  qos:DEFAULT
WallTime: 00:00:00 of 21:00:00:00
SubmitTime: Fri Feb  5 16:15:25
  (Time Queued  Total: 00:17:43  Eligible: 00:17:43)

Total Tasks: 60

Req[0]  TaskCount: 60  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [ALL]
Reservation '1098' (23:32:14 -> 21:23:32:14  Duration: 21:00:00:00)
PE:  60.00  StartPriority:  17
job can run in partition DEFAULT (108 procs available.  60 procs required)


Here is my configuration:

# Resource Manager Definition

RMCFG[torque] TYPE=PBS

# Allocation Manager Definition

AMCFG[bank]  TYPE=NONE

# full parameter docs at 
http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration

RMPOLLINTERVAL        00:00:30

SERVERPORT            42559
SERVERMODE            NORMAL

# Admin: http://supercluster.org/mauidocs/a.esecurity.html


LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3

# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html

QUEUETIMEWEIGHT       1

# Backfill: http://supercluster.org/mauidocs/8.2backfill.html

BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST

# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html

NODEACCESSPOLICY        SHARED
NODEALLOCATIONPOLICY    MINRESOURCE
ENABLEMULTIREQJOBS      TRUE

Any ideas?

Thanks,

Jon Shelley
HPC Software Consultant
Idaho National Lab
Phone (208) 526-9834
Fax (208) 526-0122




<Gareth.Williams at csiro.au> 
01/28/2010 03:24 PM

To
<Jonathan.Shelley at inl.gov>
cc

Subject
RE: [Mauiusers] Having problems scheduling jobs with PPN specification  on 
a cluster.






Hi Jon,
 
The parameter JOBNODEMATCHPOLICY matters.
It turns out that there is an alternative syntax but I don't know if maui 
supports it (or just moab).
Could you do me a favour and try a job with qsub -l procs=24
(or any other number - maybe > 24) and see if it works.
There was a recent thread on torqueusers on this.
 
cheers,
 
Gareth
 

From: Jonathan K Shelley [mailto:Jonathan.Shelley at inl.gov] 
Sent: Thursday, 28 January 2010 9:15 AM
To: mauiusers at supercluster.org
Subject: [Mauiusers] Having problems scheduling jobs with PPN 
specification on a cluster.

Maui Version: 3.2.6p21 
Torque Version: 2.3.6 

I have a cluster with 4 compute nodes. 1 node has 16 cores and the other 3 
each have 24 cores. One of the nodes is down for maintenance. So I have 64 
cores available to run jobs. 
When I submit this line 

qsub -I -l nodes=4:ppn=6 

I get the following error: 
01/27 13:41:39 INFO:     64 feasible tasks found for job 384:0 in 
partition DEFAULT (24 Needed) 
01/27 13:41:39 INFO:     inadequate feasible nodes found for job 384:0 in 
partition DEFAULT (3 < 4) 
01/27 13:41:39 ALERT:    job 384 cannot run in any partition 
01/27 13:41:39 ALERT:    cannot create new reservation for job 384 
(shape[1] 24) 
01/27 13:41:39 ALERT:    cannot create new reservation for job 384 
01/27 13:41:39 MJobSetHold(384,16,1:00:00,NoResources,cannot create 
reservation for job '384' (intital reservation attempt) 

If I submit this line 

qsub -I -l nodes=3:ppn=8 

it works just fine 
01/27 14:47:16 INFO:     64 feasible tasks found for job 385:0 in 
partition DEFAULT (24 Needed) 
01/27 14:47:16 INFO:     tasks located for job 385:  40 of 24 required (0 
feasible) 
01/27 14:47:16 MJobStart(385) 
01/27 14:47:16 MJobStart(385) 
01/27 14:47:16 MJobDistributeTasks(385,eos,NodeList,TaskMap) 
01/27 14:47:16 MAMAllocJReserve(385,RIndex,ErrMsg) 
01/27 14:47:16 MRMJobStart(385,Msg,SC) 
01/27 14:47:16 MPBSJobStart(385,eos,Msg,SC) 
01/27 14:47:16 
MPBSJobModify(385,Resource_List,Resource,eos03.inel.gov:ppn=8+eos01.inel.gov:ppn=8+eos:ppn=8) 

01/27 14:47:16 MPBSJobModify(385,Resource_List,Resource,3:ppn=8) 
01/27 14:47:16 INFO:     job '385' successfully started 


My maui.cfg file has the following variables set. 
BACKFILLPOLICY        FIRSTFIT 
RESERVATIONPOLICY     CURRENTHIGHEST 

NODEACCESSPOLICY        SHARED 
NODEALLOCATIONPOLICY    FIRSTFIT 
ENABLEMULTIREQJOBS      TRUE 


I am I missing something. Is there a way for maui to schedule the job 
since there is clearly enough space to run the job if it schedules 2 of 
requests on one node. 

Thanks,

Jon Shelley
HPC Software Consultant
Idaho National Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20100205/b5277b53/attachment.html 


More information about the mauiusers mailing list