[Mauiusers] More than one FLAGS value in standing reservation

jacksond at supercluster.org jacksond at supercluster.org
Tue Oct 12 10:09:05 MDT 2004


Martin,

   MLocalJobAllocateResources() has been extended to include StartTime 
information.  Regarding your situation, use of the 'Local' infrastructure 
looks very reasonable due to the complexity of your needs.  An alternative 
may be to use priority and reservations to accomplish a similar behavior. 
For example, you could do the following:

   Adjust the priority of bigmem class jobs so they are always higher than 
any other jobs (using class priority).  Associate the bigmem class with 
the bigmem QOS and set the reservation depth for bigmem qos jobs to a 
high value.  Finally set the reservation policy to currenthighest.

   As you did in your example, create a standing reservation over the big 
memory nodes with positive affinity for both small and bigmem nodes.  This 
will block out medium, long, and verylong jobs but will attract small and 
bigmem jobs.  If there is ever a bigmem job in the queue, it should now 
either be running or have a reservation to run at the earliest possible 
time.  The small jobs will use bigmem resources first if they are 
available and not reserved, but will use standard nodes otherwise.

   This provides a similar behavior but does not account well for 'over the 
horizon' jobs and their impact.  Moab provides better conditional logic 
and reservation control as well as improved job preemption capabilities 
which may be of use if you are interested.

Best of luck with your cluster,
Dave





  On Tue, 12 Oct 2004, Martin Thompson wrote:

>>    Thanks for the report and the patch.  Actually, the most correct
>> solution is to modify the MUBMFromString() routine.  It should
>> handle both whitespace and comma delimited fields.  Simply adding a
>> comma character to the delimiter list inside this routine will
>> accomplish what you want throughout Maui code.  This change is now
>> available in the latest pre-Maui 3.2.6p11 snapshot.
>
> Ahh, I wondered about that, but I chickened out.  I thought I could
> minimise the damage of my hacking by inserting the while loop rather
> than tweaking a function that is probably used in several other
> places.
> :-)
>
> Thanks for that.
>
> I have another question now.  Would it be a problem to include
> StartTime in the parameter list of MLocalJobAllocateResources()?  I
> wrote my own MLocalJobAllocateResources() -- all it does is tweak
> the feasible node list a little and then it calls
> MJobAllocatePriority() but that function requires StartTime.
>
> Consequently I have to hack src/moab/MSched.c, src/moab/MLocal.c and
> include/moab-proto.h every time I try out a new release of Maui.
> Not much of a big deal really, but I'd happier if we were using a
> version of Maui without my crusty hacks.
>
> Actually, while I'm on the subject of my MLocalJobAllocateResources()
> function, I wonder if I could summarise what I did, just to check if
> there is a more elegant solution.
>
> We have a cluster with two types of node: stdmem and bigmem.  We have
> 19 stdmem nodes and 4 bigmem nodes.  We have 5 queues: short, medium,
> long, verylong and bigmem.  Jobs from the bigmem queue must use the
> bigmem nodes, jobs from the short queue can only use the bigmem
> nodes if all the stdmem nodes are busy.
>
> This works ok, but we tend to have a lot of medium jobs and
> relatively few bigmem jobs.  So occasionally the bigmem nodes will
> be idle and medium jobs will be queueing up.  However, we don't want
> to let medium jobs run on the bigmem nodes because we're not
> prepared to tie up bigmem nodes with small memory jobs for that long
> (short jobs are long enough).
>
> Someone came up with the idea of letting short jobs use the bigmem
> nodes a little earlier than they would normally do so, thus freeing
> up more stdmem nodes for medium jobs.  Something like this...
>
>  if ( num_free_stdmem_nodes >= num_free_bigmem_nodes )
>  {
>    /* normal behaviour, as before */
>    SRCFG[stdmem] CLASSLIST=short+,medium+,long+,verylong+
>    SRCFG[bigmem] CLASSLIST=bigmem+,short-
>  }
>  else
>  {
>    /* short jobs now prefer bigmem nodes because there are more */
>    /* bigmem nodes free than stdmem nodes free                  */
>    SRCFG[stdmem] CLASSLIST=short-,medium+,long+,verylong+
>    SRCFG[bigmem] CLASSLIST=bigmem+,short+
>  }
>
> I couldn't think of a way of doing that purely by changing maui.cfg.
>
> Instead, I wrote a MLocalJobAllocateResources() function that looks
> at the feasible node list, and if there are more bigmem nodes free
> than stdmem nodes free, and it's a short job, then it makes all the
> stdmem nodes in the feasible node list unavailable.
>
> It works, but I'm not desperately happy about the fact I have node
> features and queue names hard-wired into the code.
>
> Might there be a more elegant solution out there?  Is the whole idea
> totally bogus?
>
> Many thanks.
>
> Mart.
>


More information about the mauiusers mailing list