[Mauiusers] Maui not respecting it's own standard reservations?

Jason Williams jasonw at jhu.edu
Wed Feb 11 07:13:09 MST 2009


Hello all,

I've been working on trying to get a very simple standard reservation to 
work for the past few days now.  The version of Maui and Torque are 
listed below as is the relevent information about the configuration.  
Now I should note that I have another, slightly older cluster, that has 
the EXACT same configuration, minus the host names and fair share 
quotas, where this reservation seems to work just fine. 

I did some digging in the log files on the two machines and it appears 
that when maui checks the nodes during its scheduling iteration on the 
old cluster, it does so in a different order than on the new cluster.  
That statement will make more sense as you read on. 

The problem is, when I submit a job to my batch queue, which does not 
have any standard reservations or acl_hosts in pbs, it winds up running 
on the hosts specified as dedicated in the debug standard reservation.  
On the old cluster, it seems that the reservation successfully keeps 
jobs off of those hosts during the time frame mentioned. 

Does anyone have any suggestions as to what I am doing wrong?  I'm sure 
it's something small that I am missing and that the docs on the site 
don't mention.  And I wonder if the difference in the order of the 
queues being mentioned in the log file has anything to do with it.  It's 
the only real difference I found in the logs between the two machines.

Version Info:

New cluster:
Maui version: 3.2.6p21
Moab Scheduling Library, version 3.2.6p20
Torque: 2.3.6

Old Cluster:
Maui version: 3.2.6p14
Moab Scheduling Library, version 3.2.6p14
Torque: 2.0.0p8


Relevant config (same between both machines):

maui.cfg:
SRCFG[debug]  ACCESS=DEDICATED
SRCFG[debug]  CLASSLIST=debug
SRCFG[debug]  STARTTIME=8:00:00 ENDTIME=18:00:00
SRCFG[debug]  HOSTLIST=node00[1-4]
SRCFG[debug]  DEPTH=10
SRCFG[debug]  DAYS=MON,TUE,WED,THU,FRI
SRCFG[debug]  TIMELIMIT=30:00

qmgr listing for debug queue:
        queue_type = Execution
        total_jobs = 0
        state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 
Exiting:0
        acl_host_enable = False
        acl_hosts = node004,node003,node002,node001
        resources_max.walltime = 01:00:00
        resources_default.walltime = 00:15:00
        enabled = True
        started = True

Possible relevantn log entries:

new cluster:

MPBSNodeUpdate(node001,node001,Idle,head)
MPBSLoadQueueInfo(head,node001,SC)
INFO:     queue 'debug' started state set to True
INFO:     class to node mapping enabled for queue 'debug'
INFO:     queue 'batch' started state set to True
INFO:     class to node not mapping enabled for queue 'batch' adding 
class to all nodes


old cluster:
MPBSNodeUpdate(node001,node001,Idle,head)
MPBSLoadQueueInfo(head,node001,SC)
INFO:     queue 'batch' started state set to True
INFO:     class to node not mapping enabled for queue 'batch' adding 
class to all nodes
INFO:     queue 'debug' started state set to True
INFO:     class to node mapping enabled for queue 'debug'


--
Jason Williams



More information about the mauiusers mailing list