[Mauiusers] Maui not respecting it's own standard reservations?
Jason Williams
jasonw at jhu.edu
Thu Feb 12 06:42:59 MST 2009
Hey Gareth,
How did I not see that. I read that section of the documentation like 3
times too. And you know what, in the config in the old cluster there is
a space on the TIMELIMIT line instead of an equal sign making it
invalid. I just commented out that line and wouldn't ya know it, the
expected behavior is happening correctly now.
This is why it's so important to have a second pair of eyes look at a
problem every now and then.
Thanks Gareth.
--
Jason
Gareth.Williams at csiro.au wrote:
> Hi Jason,
>
> The problem might be with your setup - and the older cluster rather than the new one. Your SRCFG has access control such that either jobs in the debug queue/class _or_ jobs with a TIMELIMIT of 30:00 can run in the reservation. Were the test jobs less than 30 minutes or outside the bounds of the STARTTIME/ENDTIME window?
>
> Perhaps the older setup was buggy or had hidden defaults. Instead of comparing the config files you might compare the outputs of
> 'showconfig | sort'
>
> Good luck,
>
> Gareth
>
>
>> -----Original Message-----
>> From: Jason Williams [mailto:jasonw at jhu.edu]
>> Sent: Thursday, 12 February 2009 1:13 AM
>> To: mauiusers at supercluster.org
>> Subject: [Mauiusers] Maui not respecting it's own standard reservations?
>>
>> Hello all,
>>
>> I've been working on trying to get a very simple standard reservation to
>> work for the past few days now. The version of Maui and Torque are
>> listed below as is the relevent information about the configuration.
>> Now I should note that I have another, slightly older cluster, that has
>> the EXACT same configuration, minus the host names and fair share
>> quotas, where this reservation seems to work just fine.
>>
>> I did some digging in the log files on the two machines and it appears
>> that when maui checks the nodes during its scheduling iteration on the
>> old cluster, it does so in a different order than on the new cluster.
>> That statement will make more sense as you read on.
>>
>> The problem is, when I submit a job to my batch queue, which does not
>> have any standard reservations or acl_hosts in pbs, it winds up running
>> on the hosts specified as dedicated in the debug standard reservation.
>> On the old cluster, it seems that the reservation successfully keeps
>> jobs off of those hosts during the time frame mentioned.
>>
>> Does anyone have any suggestions as to what I am doing wrong? I'm sure
>> it's something small that I am missing and that the docs on the site
>> don't mention. And I wonder if the difference in the order of the
>> queues being mentioned in the log file has anything to do with it. It's
>> the only real difference I found in the logs between the two machines.
>>
>> Version Info:
>>
>> New cluster:
>> Maui version: 3.2.6p21
>> Moab Scheduling Library, version 3.2.6p20
>> Torque: 2.3.6
>>
>> Old Cluster:
>> Maui version: 3.2.6p14
>> Moab Scheduling Library, version 3.2.6p14
>> Torque: 2.0.0p8
>>
>>
>> Relevant config (same between both machines):
>>
>> maui.cfg:
>> SRCFG[debug] ACCESS=DEDICATED
>> SRCFG[debug] CLASSLIST=debug
>> SRCFG[debug] STARTTIME=8:00:00 ENDTIME=18:00:00
>> SRCFG[debug] HOSTLIST=node00[1-4]
>> SRCFG[debug] DEPTH=10
>> SRCFG[debug] DAYS=MON,TUE,WED,THU,FRI
>> SRCFG[debug] TIMELIMIT=30:00
>>
>> qmgr listing for debug queue:
>> queue_type = Execution
>> total_jobs = 0
>> state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0
>> Exiting:0
>> acl_host_enable = False
>> acl_hosts = node004,node003,node002,node001
>> resources_max.walltime = 01:00:00
>> resources_default.walltime = 00:15:00
>> enabled = True
>> started = True
>>
>> Possible relevantn log entries:
>>
>> new cluster:
>>
>> MPBSNodeUpdate(node001,node001,Idle,head)
>> MPBSLoadQueueInfo(head,node001,SC)
>> INFO: queue 'debug' started state set to True
>> INFO: class to node mapping enabled for queue 'debug'
>> INFO: queue 'batch' started state set to True
>> INFO: class to node not mapping enabled for queue 'batch' adding
>> class to all nodes
>>
>>
>> old cluster:
>> MPBSNodeUpdate(node001,node001,Idle,head)
>> MPBSLoadQueueInfo(head,node001,SC)
>> INFO: queue 'batch' started state set to True
>> INFO: class to node not mapping enabled for queue 'batch' adding
>> class to all nodes
>> INFO: queue 'debug' started state set to True
>> INFO: class to node mapping enabled for queue 'debug'
>>
>>
>> --
>> Jason Williams
>>
>>
More information about the mauiusers
mailing list