[an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive]
This FAQ is under development. PLEASE send questions and/or solutions to help get this FAQ properly populated. Your help is greatly appreciated.
How do I prioritize my jobs?
Why won't a job run?
How do I increase system utilization?
How do I improve turnaround time for certain jobs?
Handling Maui Gotchas!
Jobs go into the state deferred for a number of reasons. The reason the job is deferred is most easily determined by using the 'checkjob' command.
- The job
violates system policies
- The job does not have access to the QOS which was requested
- The resources requested by the job do not currently exist in an available state (ie, Idle or Busy)
- Maui is configured to use an allocation manager and the job does not currently have adequate allocations to run
- Maui attempted to start the job but the underlying resource manager (ie, PBS or Loadleveler) rejected the request
The checkjob command should be able to provide some additional information
about the exact cause of the problem. The Maui log should document
the failure in detail depending upon the setting of the parameter LOGLEVEL.
To disable Maui's defer mechanism, set the DEFERTIME parameter to '0'. To release a job which is currently deferred, issue 'releasehold -a <JOBID>'.
Maui version 3.0 is unfortunately not event driven. For some resource managers such as Loadleveler, this cannot be remedied because Loadleveler does not currently support an event driven interface. For PBS systems, it appears that modifications to Maui could allow the resource manager interface to be at least partially event driven but these changes have not yet been implemented. (volunteers?) The main drawback of the polling interface is that newly submitted jobs may wait in the queue for up to <RMPOLLINTERVAL> seconds before being scheduled.
Some sites have chosen to decrease the RMPOLLINTERVAL parameter significantly. Some sites have run on large systems (> 256 nodes) with a poll interval of 5 seconds and report no problems. Maui's scheduling algorithm is very efficient and this frequency will not create a significant CPU draw. However, if LOGLEVEL is set to a high value (ie > 3) and/or the log file is located on a remote file system, the system running Maui may become IO/network bound. Additionally, on PBS systems, Maui 3.0 contacts each PBS MOM on each iteration. This may result in a fair amount of additional and unnecessary network traffic. This overhead can be significantly reduced by decreasing LOGLEVEL and increasing the node manager polling frequency via NODEPOLLFREQUENCY.
Other sites have improved job turnaround by inserting
a submit wrapper which 'wakes' Maui and causes it to immediately schedule
the job. One such wrapper is described in the Loadleveler
Maui version 3.0 works with PBS v2.[1-3] and Loadleveler
1.x through 2.2. Some of the new advanced features of LL 2.2 such
as memory tracking and arbitrary geometry support are only currently supported
via the extension interface in Maui 3.1. Efforts are currently under way
to extend support to Gridware.
Maui honors and supports node attributes/features.
It also honors PBS virtual nodes.
Yes, Maui supports classes/queues as well as class/queue
Its not my fault! Really! There are a number of problems with the PBS MOM query interface. These can be remedied by taking the following steps:
- apply the PBS patches provided
by Sandia National Laboratory (see the 'PBS
- build PBS without using RPP (use 'configure' option '--disable-rpp')
Surprisingly, numerous studies and simulations have
shown that there is not much difference in how you backfill (Much to my
disappointment, it was my Master's thesis :( ) A continuously optimizing
backfill scheduler will backfill jobs as fast as they are submitted leaving
the scheduler with few decisions to make each iteration because its job/resource
selection was minimized on the previous scheduling iteration. However,
Maui supports a number of scheduling algorithms and criteria which you
can play with. See the parameters documentation on BACKFILLPOLICY
and BACKFILLMETRIC for more
Maui uses the 'regex' library for matching regular expressions on Unix systems. Using this library, the expression 'node1' will match the 'node1' as expected, but will also match 'node10', 'node11', and even 'node100'. This can be avoided by specifying the expression as '^node1$' to indicate exact matches to string start and end. See the regex man page for further information.
[an error occurred while processing this directive] [an error occurred while processing this directive]