[Mauiusers] Maui not respecting it's own standard reservations?

Gareth.Williams at csiro.au Gareth.Williams at csiro.au
Wed Feb 11 23:01:34 MST 2009


Hi Jason,

The problem might be with your setup - and the older cluster rather than the new one.  Your SRCFG has access control such that either jobs in the debug queue/class _or_ jobs with a TIMELIMIT of 30:00 can run in the reservation.  Were the test jobs less than 30 minutes or outside the bounds of the STARTTIME/ENDTIME window?  

Perhaps the older setup was buggy or had hidden defaults.  Instead of comparing the config files you might compare the outputs of 
'showconfig | sort'

Good luck,

Gareth

> -----Original Message-----
> From: Jason Williams [mailto:jasonw at jhu.edu]
> Sent: Thursday, 12 February 2009 1:13 AM
> To: mauiusers at supercluster.org
> Subject: [Mauiusers] Maui not respecting it's own standard reservations?
> 
> Hello all,
> 
> I've been working on trying to get a very simple standard reservation to
> work for the past few days now.  The version of Maui and Torque are
> listed below as is the relevent information about the configuration.
> Now I should note that I have another, slightly older cluster, that has
> the EXACT same configuration, minus the host names and fair share
> quotas, where this reservation seems to work just fine.
> 
> I did some digging in the log files on the two machines and it appears
> that when maui checks the nodes during its scheduling iteration on the
> old cluster, it does so in a different order than on the new cluster.
> That statement will make more sense as you read on.
> 
> The problem is, when I submit a job to my batch queue, which does not
> have any standard reservations or acl_hosts in pbs, it winds up running
> on the hosts specified as dedicated in the debug standard reservation.
> On the old cluster, it seems that the reservation successfully keeps
> jobs off of those hosts during the time frame mentioned.
> 
> Does anyone have any suggestions as to what I am doing wrong?  I'm sure
> it's something small that I am missing and that the docs on the site
> don't mention.  And I wonder if the difference in the order of the
> queues being mentioned in the log file has anything to do with it.  It's
> the only real difference I found in the logs between the two machines.
> 
> Version Info:
> 
> New cluster:
> Maui version: 3.2.6p21
> Moab Scheduling Library, version 3.2.6p20
> Torque: 2.3.6
> 
> Old Cluster:
> Maui version: 3.2.6p14
> Moab Scheduling Library, version 3.2.6p14
> Torque: 2.0.0p8
> 
> 
> Relevant config (same between both machines):
> 
> maui.cfg:
> SRCFG[debug]  ACCESS=DEDICATED
> SRCFG[debug]  CLASSLIST=debug
> SRCFG[debug]  STARTTIME=8:00:00 ENDTIME=18:00:00
> SRCFG[debug]  HOSTLIST=node00[1-4]
> SRCFG[debug]  DEPTH=10
> SRCFG[debug]  DAYS=MON,TUE,WED,THU,FRI
> SRCFG[debug]  TIMELIMIT=30:00
> 
> qmgr listing for debug queue:
>         queue_type = Execution
>         total_jobs = 0
>         state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0
> Exiting:0
>         acl_host_enable = False
>         acl_hosts = node004,node003,node002,node001
>         resources_max.walltime = 01:00:00
>         resources_default.walltime = 00:15:00
>         enabled = True
>         started = True
> 
> Possible relevantn log entries:
> 
> new cluster:
> 
> MPBSNodeUpdate(node001,node001,Idle,head)
> MPBSLoadQueueInfo(head,node001,SC)
> INFO:     queue 'debug' started state set to True
> INFO:     class to node mapping enabled for queue 'debug'
> INFO:     queue 'batch' started state set to True
> INFO:     class to node not mapping enabled for queue 'batch' adding
> class to all nodes
> 
> 
> old cluster:
> MPBSNodeUpdate(node001,node001,Idle,head)
> MPBSLoadQueueInfo(head,node001,SC)
> INFO:     queue 'batch' started state set to True
> INFO:     class to node not mapping enabled for queue 'batch' adding
> class to all nodes
> INFO:     queue 'debug' started state set to True
> INFO:     class to node mapping enabled for queue 'debug'
> 
> 
> --
> Jason Williams
> 



More information about the mauiusers mailing list