[Mauiusers] Maui not respecting it's own standard reservations?

Jason Williams jasonw at jhu.edu
Thu Feb 12 06:42:59 MST 2009


Hey Gareth,
How did I not see that.  I read that section of the documentation like 3 
times too.  And you know what, in the config in the old cluster there is 
a space on the TIMELIMIT line instead of an equal sign making it 
invalid.  I just commented out that line and wouldn't ya know it, the 
expected behavior is happening correctly now. 

This is why it's so important to have a second pair of eyes look at a 
problem every now and then.

Thanks Gareth.
--
Jason


Gareth.Williams at csiro.au wrote:
> Hi Jason,
>
> The problem might be with your setup - and the older cluster rather than the new one.  Your SRCFG has access control such that either jobs in the debug queue/class _or_ jobs with a TIMELIMIT of 30:00 can run in the reservation.  Were the test jobs less than 30 minutes or outside the bounds of the STARTTIME/ENDTIME window?  
>
> Perhaps the older setup was buggy or had hidden defaults.  Instead of comparing the config files you might compare the outputs of 
> 'showconfig | sort'
>
> Good luck,
>
> Gareth
>
>   
>> -----Original Message-----
>> From: Jason Williams [mailto:jasonw at jhu.edu]
>> Sent: Thursday, 12 February 2009 1:13 AM
>> To: mauiusers at supercluster.org
>> Subject: [Mauiusers] Maui not respecting it's own standard reservations?
>>
>> Hello all,
>>
>> I've been working on trying to get a very simple standard reservation to
>> work for the past few days now.  The version of Maui and Torque are
>> listed below as is the relevent information about the configuration.
>> Now I should note that I have another, slightly older cluster, that has
>> the EXACT same configuration, minus the host names and fair share
>> quotas, where this reservation seems to work just fine.
>>
>> I did some digging in the log files on the two machines and it appears
>> that when maui checks the nodes during its scheduling iteration on the
>> old cluster, it does so in a different order than on the new cluster.
>> That statement will make more sense as you read on.
>>
>> The problem is, when I submit a job to my batch queue, which does not
>> have any standard reservations or acl_hosts in pbs, it winds up running
>> on the hosts specified as dedicated in the debug standard reservation.
>> On the old cluster, it seems that the reservation successfully keeps
>> jobs off of those hosts during the time frame mentioned.
>>
>> Does anyone have any suggestions as to what I am doing wrong?  I'm sure
>> it's something small that I am missing and that the docs on the site
>> don't mention.  And I wonder if the difference in the order of the
>> queues being mentioned in the log file has anything to do with it.  It's
>> the only real difference I found in the logs between the two machines.
>>
>> Version Info:
>>
>> New cluster:
>> Maui version: 3.2.6p21
>> Moab Scheduling Library, version 3.2.6p20
>> Torque: 2.3.6
>>
>> Old Cluster:
>> Maui version: 3.2.6p14
>> Moab Scheduling Library, version 3.2.6p14
>> Torque: 2.0.0p8
>>
>>
>> Relevant config (same between both machines):
>>
>> maui.cfg:
>> SRCFG[debug]  ACCESS=DEDICATED
>> SRCFG[debug]  CLASSLIST=debug
>> SRCFG[debug]  STARTTIME=8:00:00 ENDTIME=18:00:00
>> SRCFG[debug]  HOSTLIST=node00[1-4]
>> SRCFG[debug]  DEPTH=10
>> SRCFG[debug]  DAYS=MON,TUE,WED,THU,FRI
>> SRCFG[debug]  TIMELIMIT=30:00
>>
>> qmgr listing for debug queue:
>>         queue_type = Execution
>>         total_jobs = 0
>>         state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0
>> Exiting:0
>>         acl_host_enable = False
>>         acl_hosts = node004,node003,node002,node001
>>         resources_max.walltime = 01:00:00
>>         resources_default.walltime = 00:15:00
>>         enabled = True
>>         started = True
>>
>> Possible relevantn log entries:
>>
>> new cluster:
>>
>> MPBSNodeUpdate(node001,node001,Idle,head)
>> MPBSLoadQueueInfo(head,node001,SC)
>> INFO:     queue 'debug' started state set to True
>> INFO:     class to node mapping enabled for queue 'debug'
>> INFO:     queue 'batch' started state set to True
>> INFO:     class to node not mapping enabled for queue 'batch' adding
>> class to all nodes
>>
>>
>> old cluster:
>> MPBSNodeUpdate(node001,node001,Idle,head)
>> MPBSLoadQueueInfo(head,node001,SC)
>> INFO:     queue 'batch' started state set to True
>> INFO:     class to node not mapping enabled for queue 'batch' adding
>> class to all nodes
>> INFO:     queue 'debug' started state set to True
>> INFO:     class to node mapping enabled for queue 'debug'
>>
>>
>> --
>> Jason Williams
>>
>>     



More information about the mauiusers mailing list