[Mauiusers] Maui not respecting it's own standard reservations?
Gareth.Williams at csiro.au
Gareth.Williams at csiro.au
Wed Feb 11 23:01:34 MST 2009
Hi Jason,
The problem might be with your setup - and the older cluster rather than the new one. Your SRCFG has access control such that either jobs in the debug queue/class _or_ jobs with a TIMELIMIT of 30:00 can run in the reservation. Were the test jobs less than 30 minutes or outside the bounds of the STARTTIME/ENDTIME window?
Perhaps the older setup was buggy or had hidden defaults. Instead of comparing the config files you might compare the outputs of
'showconfig | sort'
Good luck,
Gareth
> -----Original Message-----
> From: Jason Williams [mailto:jasonw at jhu.edu]
> Sent: Thursday, 12 February 2009 1:13 AM
> To: mauiusers at supercluster.org
> Subject: [Mauiusers] Maui not respecting it's own standard reservations?
>
> Hello all,
>
> I've been working on trying to get a very simple standard reservation to
> work for the past few days now. The version of Maui and Torque are
> listed below as is the relevent information about the configuration.
> Now I should note that I have another, slightly older cluster, that has
> the EXACT same configuration, minus the host names and fair share
> quotas, where this reservation seems to work just fine.
>
> I did some digging in the log files on the two machines and it appears
> that when maui checks the nodes during its scheduling iteration on the
> old cluster, it does so in a different order than on the new cluster.
> That statement will make more sense as you read on.
>
> The problem is, when I submit a job to my batch queue, which does not
> have any standard reservations or acl_hosts in pbs, it winds up running
> on the hosts specified as dedicated in the debug standard reservation.
> On the old cluster, it seems that the reservation successfully keeps
> jobs off of those hosts during the time frame mentioned.
>
> Does anyone have any suggestions as to what I am doing wrong? I'm sure
> it's something small that I am missing and that the docs on the site
> don't mention. And I wonder if the difference in the order of the
> queues being mentioned in the log file has anything to do with it. It's
> the only real difference I found in the logs between the two machines.
>
> Version Info:
>
> New cluster:
> Maui version: 3.2.6p21
> Moab Scheduling Library, version 3.2.6p20
> Torque: 2.3.6
>
> Old Cluster:
> Maui version: 3.2.6p14
> Moab Scheduling Library, version 3.2.6p14
> Torque: 2.0.0p8
>
>
> Relevant config (same between both machines):
>
> maui.cfg:
> SRCFG[debug] ACCESS=DEDICATED
> SRCFG[debug] CLASSLIST=debug
> SRCFG[debug] STARTTIME=8:00:00 ENDTIME=18:00:00
> SRCFG[debug] HOSTLIST=node00[1-4]
> SRCFG[debug] DEPTH=10
> SRCFG[debug] DAYS=MON,TUE,WED,THU,FRI
> SRCFG[debug] TIMELIMIT=30:00
>
> qmgr listing for debug queue:
> queue_type = Execution
> total_jobs = 0
> state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0
> Exiting:0
> acl_host_enable = False
> acl_hosts = node004,node003,node002,node001
> resources_max.walltime = 01:00:00
> resources_default.walltime = 00:15:00
> enabled = True
> started = True
>
> Possible relevantn log entries:
>
> new cluster:
>
> MPBSNodeUpdate(node001,node001,Idle,head)
> MPBSLoadQueueInfo(head,node001,SC)
> INFO: queue 'debug' started state set to True
> INFO: class to node mapping enabled for queue 'debug'
> INFO: queue 'batch' started state set to True
> INFO: class to node not mapping enabled for queue 'batch' adding
> class to all nodes
>
>
> old cluster:
> MPBSNodeUpdate(node001,node001,Idle,head)
> MPBSLoadQueueInfo(head,node001,SC)
> INFO: queue 'batch' started state set to True
> INFO: class to node not mapping enabled for queue 'batch' adding
> class to all nodes
> INFO: queue 'debug' started state set to True
> INFO: class to node mapping enabled for queue 'debug'
>
>
> --
> Jason Williams
>
More information about the mauiusers
mailing list