Enabling High Availability Features
Moab Workload Manager®

21.2 Enabling High Availability Features

21.2.1 High Availability Overview

High availability allows Moab to run on two different machines: a primary and secondary server. There are two different configuration methods to achieve this behavior. The first takes advantage of a networked file system to configure two Moab servers with only one operating at a time. The second high availability configuration does not rely on a networked file system. A master Moab server operates until its machine crashes; after the master has been down for a certain amount of time, a secondary, or fallback server, takes control until the master returns to activity. It is recommended that site administrators use the networked file system configuration as there is no delay between failures and there is less chance of a synchronization failure.

21.2.2.1 Networked File System High Availability Overview (recommended configuration)

When configured to run on a networked file system—any networked filesystem that supports file locking is supported—the first Moab server that starts locks a particular file. The second Moab server waits on that lock and only begins scheduling when it gains control of the lock on the file. This method achieves near instantaneous turnover between failures and eliminates the need for two Moab servers to synchronize information periodically as the two Moab servers access the same database/checkpoint file.

21.2.2.2 Configuring High Availability on a Networked File System

Because the two Moab servers access the same files, configuration is only required in the moab.cfg file. The two hosts that run Moab must be configured with the SERVER and FBSERVER parameters. File lock is turned on using the FLAGS=filelockha parameter. Finally, the lock file is specifiled with the HALOCKFILE parameter. The following example illustrates a possible configuration:

moab.cfg
SCHEDCFG[Moab]	SERVER=host1:42559
SCHEDCFG[Moab]	FBSERVER=host2:42559
SCHEDCFG[Moab]	FLAGS=filelockha

SCHEDCFG[Moab]	HALOCKFILE=/opt/moab/.moab_lock

21.2.2.3 Confirming High Availability on a Networked File System

Adminstrators can run the mdiag -S -v command to view which Moab server is currently scheduling and responding to client requests.

21.2.3.1 Master Slave High Availability Overview

High availability allows Moab to run on two different machines: a primary and secondary server. While both are running, the secondary server, or fallback server, continually updates internal statistics, reservations, and other information to stay synchronized with the primary server. Should the primary server stop running, the secondary server picks up all responsibilities of the primary server and begins scheduling jobs and tracking internal data. When the primary server comes back online, the secondary server hands over its data and resumes functionality as the secondary server.

NOTE: By default, the fallback server pings the primary server every 30 seconds. If two successive ping attempts fail, the fallback server takes over scheduling duties. The HAPOLLINTERVAL parameter can be tuned to adjust the responsiveness of the fallback server to failures.

21.2.3.2 Configuring Master Slave High Availability

Note: When Moab is compiled separately on the primary and fallback servers, ensure that the MBUILD_SKEY defined in include/moab-local.h is the same for both builds.

For high availability to function correctly, both servers must have a properly configured moab.cfg file (that can actually be the same file—NFS mounted—for both servers) with the following lines:

moab.cfg
SCHEDCFG[mycluster]	SERVER=primaryhostname:3000
SCHEDCFG[mycluster]	FBSERVER=secondaryhostname:3020

Both SERVER and FBSERVER are of the format <HOST>[:<PORT>]. It is also necessary to ensure a few configuration settings for correct operation:

  • Each server must specify a shared key using the CLIENTCFG parameter in the moab-private.cfg file.
  • Each server must be properly configured as an administrator inside the resource manager using the CLIENTCFG AUTH parameter.
  • Each server can properly communicate with the resource manager. (See the TORQUE/PBS Integration Guide for a specific example.)

By default, the secondary server waits for two iterations before deciding to take over as the primary server. During this time (~30 seconds by default) client commands are unresponsive as neither the primary nor secondary servers are servicing requests.

Proper high availability configuration and health status of the primary and fallback servers can be determined using the mdiag -S command.

Example

moab.cfg on master server
SCHEDCFG[mycluster] SERVER=master FBSERVER=backup MODE=NORMAL
...

moab-private.cfg on master server
CLIENTCFG[mycluster] KEY=1dfv-fewv443v  HOST=backup  AUTH=admin1

moab.cfg on fallback server
# (duplicate moab.cfg of the master or the same file using a shared file system)

SCHEDCFG[mycluster] SERVER=master FBSERVER=backup MODE=NORMAL
...

moab-private.cfg on fallback server
CLIENTCFG[mycluster] KEY=1dfv-fewv443v  HOST=master  AUTH=admin1

21.2.3.3 Confirming Master Slave High Availability Configuration

The following explains how to verify that the high availability configuration is active and working as expected:

To confirm the fallback Moab server is able to communicate with the primary Moab server correctly, issue mdiag -R -v on the fallback system. Output should indicate that the State field for the resource manager should have an Active connection.

node40:~/# mdiag -R -v
RM[rmnode30]  State: Active
  Type:               PBS  ResourceType: COMPUTE
  Version:            '1.2.0p6-snap.1122589577'
  Nodes Reported:     4
  Flags:              executionServer,noTaskOrdering,typeIsExplicit
  Partition:          rmnode30
  Event Management:   EPORT=15004
  NOTE:  SSS protocol enabled
  Submit Command:     /usr/local/bin/qsub
  DefaultClass:       batch
  RM Performance:     AvgTime=0.01s  MaxTime=1.03s  (218 samples)

RM[internal]  State: Active
  Type:               SSS  
  Version:            'SSS2.0'
  Flags:              executionServer,localQueue,typeIsExplicit
  RM Performance:     AvgTime=0.00s  MaxTime=0.00s  (125 samples)


NOTE: Use 'mrmctl -f -r ' to clear stats/failures.

To confirm the fallback Moab server is correctly communicating with the primary resource manager, use the mdiag -n command, which results in output similar to the following:

compute node summary
Name                    State   Procs      Memory         Opsys

node31                  Idle    1:1        27:27     Linux-2.6
node32                  Idle    1:1        27:27     Linux-2.6
node33                  Idle    1:1        27:27     Linux-2.6
node34                  Idle    1:1        27:27     Linux-2.6
-----                     ---    4:4       108:108        -----

Total Nodes: 4  (Active: 0  Idle: 4  Down: 0)

If you do not get similar output, check the following:

  • The compute nodes are resolvable by the fallback server.
  • (If using TORQUE) The primary resource manager is specified in the $PBS_HOME/server_name file.

21.2.4 Other High Availability Configuration

Moab has many features to improve the availability of a cluster beyond the ability to automatically relocate to another execution server. The following table describes some of these features.

Feature Description
JOBACTIONONNODEFAILURE If a node allocated to an active job fails, it is possible for the job to continue running indefinitely even though the output it produces is of no value. Setting this parameter allows the scheduler to automatically preempt these jobs when a node failure is detected, possibly allowing the job to run elsewhere and also allowing other allocated nodes to be used by other jobs.
SCHEDCFG[] RECOVERYACTION If a catastrophic failure event occurs (SIGSEGV or SIGILL signal is triggered), Moab can be configured to automatically restart, trap the failure, ignore the failure, or behave in the default manner for the specified signal. These actions are specified using the values RESTART, TRAP, IGNORE, or DIE, as in the following example:

moab.cfg
SCHEDCFG[bas] MODE=NORMAL RECOVERYACTION=RESTART