[an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] Maui Scheduler Administrator's Guide

Maui Scheduler Administrator's Guide

version 3.2

Copyright © 1999-2005 Cluster Resources, Inc All Rights Reserved
Distribution of this document for commercial purposes in either
hard or soft copy form is strictly prohibited
without prior written consent from Cluster Resources, Inc.


    The Maui Scheduler is a policy engine which allows sites control over when, where, and how resources such as processors, memory, and disk are allocated to jobs.  In addition to this control, it also provides mechanisms which help to intelligently optimize the use of these resources, monitor system performance, help diagnose problems, and generally manage the system.

Table of Contents:

    1.0   Philosophy and Goals of the Maui Scheduler

    2.0   Installation and Initial Configuration
        2.1  Building and Installing Maui
        2.2  Initial Configuration
        2.3  Initial Testing

    3.0  Scheduler Basics
        3.1  Layout of Scheduler Components
        3.2  Scheduling Environment and Objects
        3.3  Scheduling Iterations and Job Flow
        3.4  Configuring the Scheduler

    4.0  Scheduler Commands
        4.1  Client Overview
        4.2  Monitoring System Status
        4.3  Managing Jobs
        4.4  Managing Reservations
        4.5  Configuring Policies
        4.6  End User Commands
        4.7  Miscellaneous Commands

    5.0  Prioritizing Jobs and Allocating Resources
        5.1  Job Priority
        5.2  Node Allocation
        5.3  Node Access
        5.4  Node Availability
        5.5  Task Distribution

    6.0  Managing Fairness - Throttling Policies, Fairshare, and Allocation Management
        6.1  Fairness Overview
        6.2  Throttling Policies
        6.3  Fairshare
        6.4  Allocation Management

   7.0  Controlling Resource Access - Reservations, Partitions, and QoS Facilities
        7.1  Advance Reservations
        7.2  Partitions
        7.3  QoS Facilities

    8.0  Optimizing Scheduling Behavior - Backfill, Node Sets, and Preemption
        8.1  Optimization Overview
        8.2  Backfill
        8.3  Node Sets
        8.4  Preemption

    9.0  Evaluating System Performance - Statistics, Profiling, Testing, and Simulation
        9.1  Scheduler Performance Evaluation Overview
        9.2  Accounting - Job and System Statistics
        9.3  Profiling Current and Historical Usage
        9.4  Testing New Versions and Configurations
        9.5  Answering 'What If?' Questions with the Simulator

    10.0  Managing Shared Resources - SMP Issues and Policies
        10.1  Consumable Resource Handling
        10.2  Load Balancing Features
        10.3  Resource Usage Tracking
        10.4  Resource Usage Limits

    11.0  General Job Administration
        11.1  Deferred Jobs and Job Holds
        11.2  Job Priority Management
        11.3  Suspend/Resume Handling
        11.4  Checkpoint/Restart
        11.5  Job Dependencies
        11.6  Setting Job Defaults and Per Job Limits
        11.7  General Job Policies
        11.8  Using a Local Queue

    12.0  General Node Administration
        12.1  Node Location (Partitions, Frames, Queues, etc.)
        12.2  Node Attributes (Node Features, Speed, etc.)
        12.3  Node Specific Policies (MaxJobPerNode, etc.)
        12.4  Configuring Node-Locked Consumable Generic Resources (tape drives, node-locked licenses, etc.)

    13.0  Resource Managers and Interfaces
        13.1  Resource Manager Overview
        13.2  Resource Manager Configuration
        13.3  Resource Manager Extensions
        13.4  Adding Resource Manager Interfaces

    14.0  Trouble Shooting and System Maintenance
        14.1  Internal Diagnostics
        14.2  Logging Facilities
        14.3  Using the Message Buffer
        14.4  Handling Events with the Notification Routine
        14.5  Issues with Client Commands
        14.6  Tracking System Failures
        14.7  Problems with Individual Jobs

    15.0  Improving User Effectiveness
        15.1  User Feedback Loops
        15.2  User Level Statistics
        15.3  Enhancing Wallclock Limit Estimates
        15.4  Providing Resource Availability Information
        15.5  Job Start Time Estimates
        15.6  Collecting Performance Information on Individual Jobs

    16.0  Simulations
        16.1  Simulation Overview
        16.2  Resource Traces
        16.3  Workload Traces
        16.4  Simulation Specific Configuration

    17.0  Miscellaneous
        17.1  User Feedback
        17.2  Grid Scheduling
        17.3  Enabling High Availability Features
        17.4  Using the Application Scheduling Library

        Appendix A:  Case Studies
        Appendix B:  Extension Interface
        Appendix C:  Adding New Algorithms
        Appendix D:  Structure Limits
        Appendix E:  Security Configuration
        Appendix F:  Parameters Overview
        Appendix G:  Commands Overview
        Appendix H:  Interfacing to Maui
        Appendix I:   Considerations for Large Clusters
        Appendix J:   Differences Guide
        Appendix K:   Maui-Moab Comparison [an error occurred while processing this directive] [an error occurred while processing this directive]