[an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] Scheduling FAQ (Under construction

Scheduling FAQ


This FAQ is under development.  PLEASE send questions and/or solutions to help get this FAQ properly populated.  Your help is greatly appreciated.

How do I prioritize my jobs?
Why won't a job run?
How do I increase system utilization?
How do I improve turnaround time for certain jobs?
Handling Maui Gotchas!

Job Management



Why does my job go into the state 'deferred'?

    Jobs go into the state deferred for a number of reasons.  The reason the job is deferred is most easily determined by using the 'checkjob' command.

        -    The job violates system policies
        -    The job does not have access to the QOS which was requested
        -    The resources requested by the job do not currently exist in an available state (ie, Idle or Busy)
        -    Maui is configured to use an allocation manager and the job does not currently have adequate allocations to run
        -    Maui attempted to start the job but the underlying resource manager (ie, PBS or Loadleveler) rejected the request

The checkjob command should be able to provide some additional information about the exact cause of the problem.  The Maui log should document the failure in detail depending upon the setting of the parameter LOGLEVEL.
To disable Maui's defer mechanism, set the DEFERTIME parameter to '0'.  To release a job which is currently deferred, issue 'releasehold -a <JOBID>'.


Maui Behavior



Can I decrease Maui's default poll interval, and if so what are the consequences?

    Maui version 3.0 is unfortunately not event driven.  For some resource managers such as Loadleveler, this cannot be remedied because Loadleveler does not currently support an event driven interface.  For PBS systems, it appears that modifications to Maui could allow the resource manager interface to be at least partially event driven but these changes have not yet been implemented. (volunteers?)  The main drawback of the polling interface is that newly submitted jobs may wait in the queue for up to <RMPOLLINTERVAL> seconds before being scheduled.

    Some sites have chosen to decrease the RMPOLLINTERVAL parameter significantly.  Some sites have run on large systems (> 256 nodes) with a poll interval of 5 seconds and report no problems.  Maui's scheduling algorithm is very efficient and this frequency will not create a significant CPU draw.  However, if LOGLEVEL is set to a high value (ie > 3) and/or the log file is located on a remote file system, the system running Maui may become IO/network bound.  Additionally, on PBS systems, Maui 3.0 contacts each PBS MOM on each iteration.  This may result in a fair amount of additional and unnecessary network traffic.  This overhead can be significantly reduced by decreasing LOGLEVEL and increasing the node manager polling frequency via NODEPOLLFREQUENCY.

    Other sites have improved job turnaround by inserting a submit wrapper which 'wakes' Maui and causes it to immediately schedule the job.  One such wrapper is described in the Loadleveler Integration Guide.



What resource managers does Maui currently support?

    Maui version 3.0 works with PBS v2.[1-3] and Loadleveler 1.x through 2.2.  Some of the new advanced features of LL 2.2 such as memory tracking and arbitrary geometry support are only currently supported via the extension interface in Maui 3.1. Efforts are currently under way to extend support to Gridware.



Does Maui honor resource manager node attributes?

    Maui honors and supports node attributes/features.  It also honors PBS virtual nodes.



Does Maui honor resource manager classes/queues?

    Yes, Maui supports classes/queues as well as class/queue node constraints



Why does PBS occasionally hang when Maui queries it?

    Its not my fault! Really!  There are a number of problems with the PBS MOM query interface.  These can be remedied by taking the following steps:

        - apply the PBS patches provided by Sandia National Laboratory (see the 'PBS Integration Guide')
        - build PBS without using RPP (use 'configure' option '--disable-rpp')
 
 

Scheduling Behavior



What is the best way to backfill?

    Surprisingly, numerous studies and simulations have shown that there is not much difference in how you backfill (Much to my disappointment, it was my Master's thesis :( )  A continuously optimizing backfill scheduler will backfill jobs as fast as they are submitted leaving the scheduler with few decisions to make each iteration because its job/resource selection was minimized on the previous scheduling iteration.  However, Maui supports a number of scheduling algorithms and criteria which you can play with.  See the parameters documentation on BACKFILLPOLICY and BACKFILLMETRIC for more information.


Gotcha's!



The node regular expression matches too many nodes. How do I get it to just match the ones I want?

    Maui uses the 'regex' library for matching regular expressions on Unix systems. Using this library, the expression 'node1' will match the 'node1' as expected, but will also match 'node10', 'node11', and even 'node100'. This can be avoided by specifying the expression as '^node1$' to indicate exact matches to string start and end. See the regex man page for further information.

[an error occurred while processing this directive] [an error occurred while processing this directive]