|
|
Scheduling FAQThis FAQ is under development. PLEASE send questions and/or solutions to help get this FAQ properly populated. Your help is greatly appreciated. How do I prioritize my jobs?
Job Management
Why does my job go into the state 'deferred'? Jobs go into the state deferred for a number of reasons. The reason the job is deferred is most easily determined by using the 'checkjob' command. - The job
violates system policies
The checkjob command should be able to provide some additional information
about the exact cause of the problem. The Maui log should document
the failure in detail depending upon the setting of the parameter LOGLEVEL.
Maui Behavior
Can I decrease Maui's default poll interval, and if so what are the consequences? Maui version 3.0 is unfortunately not event driven. For some resource managers such as Loadleveler, this cannot be remedied because Loadleveler does not currently support an event driven interface. For PBS systems, it appears that modifications to Maui could allow the resource manager interface to be at least partially event driven but these changes have not yet been implemented. (volunteers?) The main drawback of the polling interface is that newly submitted jobs may wait in the queue for up to <RMPOLLINTERVAL> seconds before being scheduled. Some sites have chosen to decrease the RMPOLLINTERVAL parameter significantly. Some sites have run on large systems (> 256 nodes) with a poll interval of 5 seconds and report no problems. Maui's scheduling algorithm is very efficient and this frequency will not create a significant CPU draw. However, if LOGLEVEL is set to a high value (ie > 3) and/or the log file is located on a remote file system, the system running Maui may become IO/network bound. Additionally, on PBS systems, Maui 3.0 contacts each PBS MOM on each iteration. This may result in a fair amount of additional and unnecessary network traffic. This overhead can be significantly reduced by decreasing LOGLEVEL and increasing the node manager polling frequency via NODEPOLLFREQUENCY. Other sites have improved job turnaround by inserting
a submit wrapper which 'wakes' Maui and causes it to immediately schedule
the job. One such wrapper is described in the Loadleveler
Integration Guide.
What resource managers does Maui currently support? Maui version 3.0 works with PBS v2.[1-3] and Loadleveler
1.x through 2.2. Some of the new advanced features of LL 2.2 such
as memory tracking and arbitrary geometry support are only currently supported
via the extension interface in Maui 3.1. Efforts are currently under way
to extend support to Gridware.
Does Maui honor resource manager node attributes? Maui honors and supports node attributes/features.
It also honors PBS virtual nodes.
Does Maui honor resource manager classes/queues? Yes, Maui supports classes/queues as well as class/queue
node constraints
Why does PBS occasionally hang when Maui queries it? Its not my fault! Really! There are a number of problems with the PBS MOM query interface. These can be remedied by taking the following steps: - apply the PBS patches provided
by Sandia National Laboratory (see the 'PBS
Integration Guide')
Scheduling Behavior
What is the best way to backfill? Surprisingly, numerous studies and simulations have
shown that there is not much difference in how you backfill (Much to my
disappointment, it was my Master's thesis :( ) A continuously optimizing
backfill scheduler will backfill jobs as fast as they are submitted leaving
the scheduler with few decisions to make each iteration because its job/resource
selection was minimized on the previous scheduling iteration. However,
Maui supports a number of scheduling algorithms and criteria which you
can play with. See the parameters documentation on BACKFILLPOLICY
and BACKFILLMETRIC for more
information.
Gotcha's! The node regular expression matches too many nodes. How do I get it to just match the ones I want? Maui uses the 'regex' library for matching regular expressions on Unix systems. Using this library, the expression 'node1' will match the 'node1' as expected, but will also match 'node10', 'node11', and even 'node100'. This can be avoided by specifying the expression as '^node1$' to indicate exact matches to string start and end. See the regex man page for further information.
|
|
| © 2001-2010 Adaptive Computing Enterprises, Inc. | |