Maui can be used as an external scheduler for the
PBS resource management system. In this configuration, PBS manages the
job queue and the compute resources while Maui queries the PBS Server and
the PBS MOM's to obtain up to date job and node information. Using
this information, Maui directs PBS to manage jobs in accordance with specified
Maui policies, priorities, and reservations.
Steps
Maui drives PBS via the PBS scheduling API.
To enable Maui scheduling, the following steps must be taken:
1. Install PBS
Keep track of the PBS target directory, $PBSTARGDIR.
2. Install Maui
If you have a non-standard
PBS installation, You may need to modify src/Makefile and change PBSIP and PBSLP values and references as necessary for your local site configuration.
The configure script
will automatically setup Maui so that the user running configure will become
the default Primary Maui Administrator, $MAUIADMIN. This can
be changed by modifying the 'ADMIN <USERNAME>' line in the maui.cfg
file. The primary administrator is the first user listed after the
ADMIN1 parameter and is the
ID under which the Maui daemon will run.
Some Tru64 and IRIX systems
have a local libnet library which conflicts with PBS's libnet library.
To resolve this, try setting PBSLIB to '${PBSLIBDIR}/libnet.a -lpbs' in
the Maui Makefile.
Maui is 64 bit compatible.
If PBS is compiled in 64 bit mode, Maui will likewise need to be compiled
in this manner in order to utilize the PBS scheduling API. (i.e., for IRIX
compilers, add '-64' to OSCCFLAGS and OSLDFLAGS variables
in the Makefile).
3. Configure PBS
Make sure $MAUIADMIN a PBS admin.
Maui communicates with both pbs_server and pbs_mom daemons. The $MAUIADMIN
should be authorized to talk to both PBS daemons. (suggestions)
For security purposes,
sites may want to run Maui under a non-root user id, the mom_priv/config files
must be world-readable and contain the line '$restricted *.<YOURDOMAIN>
(i.e., '$restricted *.uconn.edu').
PBS nodes can be configured
as time sharedorspace shared according to local
needs.
Maui utilizes PBS's scheduling port to obtain real-time event information from PBS regarding job and node transitions. Leaving the default qmgr setting of 'set server scheduling=True' will allow Maui to receive and process this real-time information.
Do not start the TORQUE's pbs_sched daemon. This is the default scheduler for TORQUE and Maui/Moabwill provide this service.
Maui's user interface port set using the parameter SCHEDCFG or SERVERPORT is used for user-scheduler communication. This port must be different from the PBS scheduler port used for resource manager-scheduler communication.
PBS supports the concept
of virtual nodes. Using this feature, Maui can individually
schedule processors on SMP nodes. The PBS Administrator's Guide explains
how to set up the '$PBS_HOME/server_priv/nodes' file to enable this capability.
(i.e., <NODENAME> np=<VIRTUAL NODE COUNT>).
4. Configure Maui
Next, specify PBS as the resource manager. This should be taken care of by 'configure', but if not, the following
parameter must be specified in the maui.cfg file:
If a non-standard PBS configuration is being used, additional Maui parameters may be required to point Maui to the right location:
Maui 3.0 and earlier only supports the following node specifications:
Maui 3.2 supports basic scheduling
of all PBS node specifications but will provide only limited optimization
services for these jobs.
Maui is by default very
liberal in its interpretation of <NODECOUNT>:PPN=<X>. In its
standard configuration, Maui interprets this as 'give the job <NODECOUNT>*<X>
tasks with AT LEAST <X> tasks per node'. Set the JOBNODEMATCHPOLICY
parameter to EXACTNODE to have Maui support PBS's default allocation
behavior of <NODECOUNT> nodes with exactly <X> tasks per node.
Maui features not supported by PBS:
PBS does not support the
concept of a job QOS or other extended scheduling features by default.
This can be fixed using the techniques described here.
See the RM Extensions Overview for
more information.
Earlier versions of PBS do not maintain job completion information. Thus, an external scheduler
cannot determine if the job completed successfully or if internal PBS problems
occurred preventing the job from being properly updated. This problem
will not in any way affect proper scheduling of jobs but may, potentially,
affect scheduler statistics. If your site is prone to frequent PBS
hangs,
you may want to set the Maui PURGETIME
parameter to allow Maui to hold job information in memory for a period
of time until PBS recovers.
Trouble-shooting
Common Problems:
PBS versions prior to 2.4 hang when MOM's have troubles.
On TRU64 systems, the PBS 'libpbs'
library does not properly export a number of symbols required by Maui.
This can be worked around by modifying the Maui Makefile to link the PBS
'rm.o' object file directly into Maui.