Installation Notes for Moab and Torque on the Cray XT4

This document provides information on the steps to install Moab and Torque on a Cray XT4 system.


Overview

Moabs and Torque can be used to manage the batch system for a Cray XT4 supercomputer. This document describes how Moab can be configured to use Torque and the native resource manager interface to bring Moab's unmatched scheduling capabilities to the Cray XT4.


Torque Installation Notes


Download the latest Torque release.

Download the latest Torque release from Cluster Resources, Inc.

Example 1. Download Torque

# cd /rr/current/software

# wget http://www.clusterresources.com/downloads/torque/torque-2.2.0.tar.gz


Unpack the Torque tarball

Using xtopview, unpack the Torque tarball into the software directory in the shared root.

Example 2. Unpack Torque

# xtopview

default/:/ # cd /software

default/:/software # tar -zxvf torque-2.2.0.tar.gz


Configure Torque

While still in xtopview, run configure with the options set appropriately for your installation. Run ./configure —help to see a list of configure options. CRI recommends installing the torque binaries into /opt/torque/$version and establishing a symbolic link to it from /opt/torque/default. At a minimum, you will need to specify the hostname where the torque server will run (--with-default-server) if it is different from the host it is being compiled on. The torque server host will normally be the sdb node for XT4 installations.

Example 3. Run configure

default/:/software # cd torque-2.2.0

default/:/software/torque-2.2.0 # ./configure --prefix=/opt/torque/2.2.0 --with-server-home=/var/spool/torque --with-default-server=nid00003 --enable-syslog


Compile and Install Torque

While still in xtopview, compile and install torque into the shared root. You may also need to link /opt/torque/default to this installation. Exit xtopview.

Example 4. Make and Make Install

default/:/software/torque-2.2.0 # make

default/:/software/torque-2.2.0 # make packages

default/:/software/torque-2.2.0 # make install

default/:/software/torque-2.2.0 # ln -sf /opt/torque/2.2.0/ /opt/torque/default

default/:/software/torque-2.2.0 # exit


Copy your torque server directory to your moab server host

In this example we assume the torque server will be running on the sdb node. If you are installing torque with its server home in /var as in this example and assuming that your var filesystem is being served from your boot node under /snv, you will need to login to sdb and determine the nid with 'cat /proc/cray_xt/nid'.

Example 5. Copy out torque home directory to server

# cd /rr/current/var/spool

# cp -pr torque /snv/3/var/spool


Stage out mom dirs to login nodes

Stage out the mom dirs and client server info on all login nodes. This example assumes you are using a persistent /var filesystems mounted from /snv on the boot node. Alternatively, a ram var filesystem must be populated by a skeleton tarball on the bootnode (/rr/current/.shared/var-skel.tgz) into which these files must be added. The example below assumes that you have 3 login nodes with nids of 4, 64 and 68. Place the hostname of the sdb node in the server_name file.

Example 6. Copy out mom dirs and client server info

# cd /rr/current/software/torque-2.2.0/tpackages/mom/var/spool

# for i in 4 64 68 > do cp -pr torque /snv/$i/var/spool > echo nid00003 > /snv/$i/var/spool/torque/server_name > # Uncomment the following if userids are not resolvable from the pbs_server host > # echo "QSUBSENDUID true" > /snv/$i/var/spool/torque/torque.cfg > done
        


Setup the torque server

Configure the torque server by informing it of its hostname and running the torque.setup script.

Example 7. Set the server name and run torque.setup

# hostname > /var/spool/torque/server_name

# export PATH=/opt/torque/default/sbin:/opt/torque/default/bin:$PATH

# cd /software/torque-2.2.0

# ./torque.setup root


Customize the server parameters

Add access and submit permission from your login nodes. You will need to enable host access by setting acl_host_enable to true and adding the nid hostnames of your login nodes to acl_hosts. In order to be able to submit from these same login nodes, you need to add them as submit_hosts and this time use their hostnames as returned from the hostname command.

Example 8. Customize server settings

Enable scheduling to allow Torque events to be sent to Moab. Note: If this is not set, Moab will automatically set it on startup.

# qmgr -c "set server scheduling = true"

Keep information about completed jobs around for a time so that Moab can detect and record their completion status. Note: If this is not set, Moab will automatically set it on startup.

# qmgr -c "set server keep_completed = 300"

Set the default node count for a job to be 1.

# qmgr -c "set server resources_default.nodes = 1"

Set resources_available.nodes equal to the maximum number of procs that can be requested in a job.

# qmgr -c "set server resources_available.nodes = 12500"

Do this for each queue individually as well.

# qmgr -c "set queue batch resources_available.nodes = 12500"

Only allow jobs submitted from hosts specified by the acl_hosts parameter.

# qmgr -c "set server acl_host_enable = true"

# qmgr -c "set server acl_hosts += nid00004"

# qmgr -c "set server acl_hosts += nid00064"

# qmgr -c "set server acl_hosts += nid00068"

# qmgr -c "set server submit_hosts += login1"

# qmgr -c "set server submit_hosts += login2"

# qmgr -c "set server submit_hosts += login3"

# #qmgr -c "set server disable_server_id_check = true"


Define your login nodes to torque.

Define your login nodes to torque. You should set np to the number of cores on your system.

Example 9. Populate the nodes file

# vi /var/spool/torque/server_priv/nodes

login1 np=128
login2 np=128
login3 np=128
        


Install the pbs_server init.d script on the server (Optional)

Torque provides an init.d script for starting pbs_server as a service.

Example 10. Copy in init.d script

# cd /rr/current/software/torque-2.2.0

# cp contrib/init.d/pbs_server /etc/init.d

# chmod +x /etc/init.d/pbs_server

Edit the init.d file as necessary -- i.e. change PBS_DAEMON and PBS_HOME as appropriate.

# vi /etc/init.d/pbs_server

PBS_DAEMON=/opt/torque/default/sbin/pbs_server
PBS_HOME=/var/spool/torque
        


Install the pbs_mom init.d script on the login nodes (Optional)

Torque provides an init.d script for starting pbs_mom as a service.

Example 11. Copy in init.d script

# cd /rr/current/software/torque-2.2.0

Edit the init.d file as necessary -- i.e. change PBS_DAEMON and PBS_HOME as appropriate.

# vi contrib/init.d/pbs_mom

PBS_DAEMON=/opt/torque/default/sbin/pbs_mom
PBS_HOME=/var/spool/torque
        

# pdcp -w login1,login2,login3 contrib/init.d/pbs_mom /etc/init.d

# pdsh -w login1,login2,login3 chmod +x /etc/init.d/pbs_mom


Install the module files (Optional)

Moab provides module files that can be used to establish the proper Torque environment. You may wish to copy this out onto the login nodes as well.

Example 12. Copy in module files

# cd /rr/current/software/moab-5.1.0p5

# mkdir /etc/modulefiles/torque

# cp contrib/modulefiles/torque /etc/modulefiles/torque/.module

# cd /etc/modulefiles/torque

# ln -s .module torque-2.2.0

# vi .version

#%Module1.0
set ModulesVersion      "torque-2.2.0"
        


Stop the Torque server

Example 13. Stop Torque

# /opt/torque/default/bin/qterm

Alternatively, if you installed the init.d script, you may run:

# service pbs_server stop


Startup the Torque Mom Daemons

On the boot node as root:

Example 14. Start up the pbs_moms on the login nodes.

# pdsh -w login1,login2,login3 /opt/torque/default/sbin/pbs_mom

Alternatively, if you installed the init.d script, you may run:

# pdsh -w login1,login2,login3 /sbin/service pbs_mom start


Startup the Torque Server

On the torque server host as root:

Example 15. Start pbs_server

# /opt/torque/default/sbin/pbs_server

Alternatively, if you installed the init.d script, you may run:

# service pbs_server start


Moab Install Notes

Install Torque

If Torque is not already installed on your system, follow the Torque-XT4 Installation Notes to install Torque on the sdb node.


Download the latest Moab release

Download the latest Moab release from Cluster Resources, Inc.

Note: The correct tarball type can be recognized by the xt4 tag in its name.

Example 16. Download Moab

# cd /rr/current/software

# wget --http-user=user --http-passwd=passwd http://www.clusterresources.com/downloads/mwm/temp/moab-5.2.2.s10021-linux-x86_64-torque2-xt4.tar.gz


Unpack the Moab tarball

Using xtopview, unpack the Moab tarball into the software directory in the shared root.

Example 17. Unpack Moab

# xtopview

default/:/ # cd /software

default/:/software # tar -zxvf moab-5.2.2.s10021-linux-x86_64-torque2-xt4.tar.gz


Configure Moab

While still in xtopview, run configure with the options set appropriately for your installation. Run ./configure —help to see a list of configure options. CRI recommends installing the moab binaries into /opt/moab/$version and establishing a symbolic link to it from /opt/moab/default. Since the moab home directory must be read-write by root, CRI recommends you specify the homedir in a location such as /var/spool/moab.

Example 18. Run configure

default/:/software # cd moab-5.2.2.s10021

default/:/software/moab-5.2.2.s10021 # autoconf

default/:/software/moab-5.2.2.s10021 # ./configure --prefix=/opt/moab/5.2.2.s10021 --with-homedir=/var/spool/moab --with-torque


Compile and Install Moab

While still in xtopview, install moab into the shared root. You may also need to link /opt/moab/default to this installation.

Example 19. Make Install

default/:/software/moab-5.2.2.s10021 # make install

default/:/software/moab-5.2.2.s10021 # ln -sf /opt/moab/5.2.2.s10021/ /opt/moab/default


Install the module files (Optional)

Moab provides a module file that can be used to establish the proper Moab environment. You may also want to install these module files onto the login nodes.

Example 20. make modulefiles

default/:/software/moab-5.2.2.s10021 # make modulefiles


Install the Perl XML Modules and exit xtopview

Moab's native resource manager interface scripts require a Perl XML Module to communicate via the basil interface. The Perl XML::LibXML module should be installed. The default method is to use the perldeps make target to install a bundled version of the module into a local Moab lib directory. This module may also be downloaded and installed from Perl's CPAN directory. Exit xtopview.

Example 21. make perldeps

default/:/software/moab-5.2.2.s10021 # make perldeps

default/:/software/moab-5.2.2.s10021 # exit


Copy your moab home directory to your moab server host

In this example we assume the moab server will be running on the sdb node. If you are installing moab with its server home in /var as in this example and assuming that your var filesystem is being served from your boot node under /snv, you will need to login to sdb and determine the nid with 'cat /proc/cray_xt/nid'.

Example 22. Copy out moab home directory

# cd /rr/current/var/spool

# cp -pr moab /snv/3/var/spool


Customize the moab configuration file for your moab server host

The moab.cfg file should be customized for your scheduling environment. See the Moab Admin Guide for more details.

Example 23. Edit the moab configuration file

# cd /snv/3/var/spool/moab

# vi moab.cfg

SCHEDCFG[moab]     SERVER=sdb:42559

TOOLSDIR           /opt/moab/default/tools

RMCFG[clustername] TYPE=NATIVE:XT4

NODECFG[DEFAULT]   OS=linux ARCH=XT
NODEACCESSPOLICY   SINGLEJOB
        


Copy the Moab configuration file to all of the login nodes

The only essential parameter is the SCHEDCFG line so the clients can find the server. This example assumes you are using a persistent /var filesystems mounted from /snv on the boot node and that your login nodes have nids of 4, 64 and 68. Alternatively, a ram var filesystem must be populated by a skeleton tarball on the bootnode (/rr/current/.shared/var-skel.tgz) into which these files must be added.

Example 24. Copy out the configuration file

# for i in 4 64 68; do mkdir -p /snv/$i/var/spool/moab; cp moab.cfg /snv/$i/var/spool/moab; done

# vi moab.cfg

SCHEDCFG[moab]   SERVER=sdb:42559
RMCFG[mycluster] TYPE=NATIVE:XT4
NODECFG[DEFAULT] OS=linux ARCH=XT
        


Customize the XT4 native resource manager interface configuration file

The resource manager native interface tools are located in the $prefix/tools directory by default and consist of a configuration file (config.xt4.pl) and various scripts (job.query.xt4.pl, node.query.xt4.pl, job.start.xt4.pl, job.cancel.xt4.pl, ...). Edit the configuration file to apply to your system environment.

Example 25. Edit the XT4 configuration file

# cd /rr/current/opt/moab/default/tools

# vi config.xt4.pl

$ENV{PATH} = "/opt/torque/default/bin:/usr/bin:$ENV{PATH}";
$loginPattern = "^login"; # These are the login nodes used by interactive jobs
$yodPattern   = "^login"; # These are the nodes running pbs_mom
        


Install the moab init.d script (Optional)

Moab provides an init.d script for starting Moab as a service. Using xtopview into the sdb node, copy the init script into /etc/init.d.

Example 26. Copy in init.d script to the sdb node from the shared root.

# xtopview -n 3

node/3:/ # cp /software/moab/moab-5.1.0/contrib/init.d/moab /etc/init.d/

node/3:/ # xtspec /etc/init.d/moab

node/3:/ # exit


Set the proper environment

The MOABHOMEDIR environment variable must be set in your environment when starting moab or using moab commands. You will also want to adjust your path to include the moab and torque bin and sbin directories. The proper environment can be established by loading the appropriate moab module, by sourcing properly edited login files, or by directly modifying your environment variables.

Example 27. Loading the moab module

# module load moab

Example 28. Exporting the environment variables by hand (in bash)

# export MOABHOMEDIR=/var/spool/moab

# export PATH=$PATH:/opt/moab/default/bin:/opt/moab/default/sbin:/opt/torque/default/bin:/opt/torque/default/sbin


Startup the Moab Workload Manager

Start up the moab daemon.

Example 29. Start Moab

# /opt/moab/default/sbin/moab

Alternatively, if you installed the init.d script, you may run:

# service moab start


Torque Upgrade Notes

Quiesce the system.

It is preferable to have no running jobs during the upgrade. This can be done by closing all queues in Torque or setting a system reservation in Moab and waiting for all jobs to complete. Often, it is possible to upgrade Torque with running jobs in the system, but you may risk problems associated with Torque being down when the jobs complete and incompatibilities between the new and old file formats and job states.


Shutdown the Torque Mom Daemons

On the boot node as root:

Example 30. Shut down the pbs_moms on the login nodes.

# pdsh -w login1,login2,login3 /opt/torque/default/sbin/momctl -s

Alternatively, if you installed the init.d script, you may run:

# pdsh -w login1,login2,login3 /sbin/service pbs_mom stop


Stop the Torque server

Example 31. Stop Torque

# /opt/torque/default/bin/qterm

Alternatively, if you installed the init.d script, you may run:

# service pbs_server stop


Download the latest Torque release.

Download the latest Torque release from Cluster Resources, Inc.

Example 32. Download Torque

# cd /rr/current/software

# wget http://www.clusterresources.com/downloads/torque/torque-2.2.0.tar.gz


Unpack the Torque tarball

Using xtopview, unpack the Torque tarball into the software directory in the shared root.

Example 33. Unpack Torque

# xtopview

default/:/ # cd /software

default/:/software # tar -zxvf torque-2.2.0.tar.gz


Configure Torque

While still in xtopview, run configure with the options set appropriately for your installation. Run ./configure —help to see a list of configure options. CRI recommends installing the torque binaries into /opt/torque/$version and establishing a symbolic link to it from /opt/torque/default. At a minimum, you will need to specify the hostname where the torque server will run (--with-default-server) if it is different from the host it is being compiled on. The torque server host will normally be the sdb node for XT4 installations.

Example 34. Run configure

default/:/software # cd torque-2.2.0

default/:/software/torque-2.2.0 # ./configure --prefix=/opt/torque/2.2.0 --with-server-home=/var/spool/torque --with-default-server=nid00003 --enable-syslog


Compile and Install Torque

While still in xtopview, compile and install torque into the shared root. You may also need to link /opt/torque/default to this installation. Exit xtopview.

Example 35. Make and Make Install

default/:/software/torque-2.2.0 # make

default/:/software/torque-2.2.0 # make packages

default/:/software/torque-2.2.0 # make install

default/:/software/torque-2.2.0 # rm /opt/torque/default

default/:/software/torque-2.2.0 # ln -sf /opt/torque/2.2.0/ /opt/torque/default

default/:/software/torque-2.2.0 # exit


Startup the Torque Mom Daemons

Note: If you have still have running jobs, you will want to start pbs_mom with the -p flag to preserve running jobs. By default, the init.d startup script will not preserve running jobs unless altered to start pbs_mom with the -p flag.

On the boot node as root:

Example 36. Start up the pbs_moms on the login nodes.

# pdsh -w login1,login2,login3 /opt/torque/default/sbin/pbs_mom -p


Startup the Torque Server

On the torque server host as root:

Example 37. Start pbs_server

# /opt/torque/default/sbin/pbs_server

Alternatively, if you installed the init.d script, you may run:

# service pbs_server start


Moab Upgrade Notes

Quiesce the system.

It is preferable to have no running jobs during the upgrade. This can be done by setting a system reservation in Moab and waiting for all jobs to complete. Often, it is possible to upgrade Moab with running jobs in the system, but you may risk problems associated with Moab being down when the jobs complete.


Shutdown the Moab Workload Manager

Shut down the moab daemon.

Example 38. Stop Moab

# /opt/moab/default/sbin/mschedctl -k

Alternatively, if you installed the init.d script, you may run:

# service moab stop


Download the latest Moab release

Download the latest Moab release from Cluster Resources, Inc.

Note: The correct tarball type can be recognized by the xt4 tag in its name.

Example 39. Download Moab

# cd /rr/current/software

# wget --http-user=user --http-passwd=passwd http://www.clusterresources.com/downloads/mwm/temp/moab-5.2.2.s10021-linux-x86_64-torque2-xt4.tar.gz


Unpack the Moab tarball

Using xtopview, unpack the Moab tarball into the software directory in the shared root.

Example 40. Unpack Moab

# xtopview

default/:/ # cd /software

default/:/software # tar -zxvf moab-5.2.2.s10021-linux-x86_64-torque2-xt4.tar.gz


Configure Moab

While still in xtopview, run configure with the options set appropriately for your installation. Run ./configure —help to see a list of configure options. CRI recommends installing the moab binaries into /opt/moab/$version and establishing a symbolic link to it from /opt/moab/default. Since the moab home directory must be read-write by root, CRI recommends you specify the homedir in a location such as /var/spool/moab.

Example 41. Run configure

default/:/software # cd moab-5.2.2.s10021

default/:/software/moab-5.2.2.s10021 # autoconf

default/:/software/moab-5.2.2.s10021 # ./configure --prefix=/opt/moab/5.2.2.s10021 --with-homedir=/var/spool/moab --with-torque


Compile and Install Moab

While still in xtopview, install moab into the shared root. You may also need to link /opt/moab/default to this installation.

Example 42. Make Install

default/:/software/moab-5.2.2.s10021 # make install

default/:/software/moab-5.2.2.s10021 # ln -sf /opt/moab/5.2.2.s10021/ /opt/moab/default


Install the Perl XML Modules and exit xtopview

If you have previously installed the perl modules in the perl site directories (configure --with-perl-libs=site), you should not need to remake the perl modules. However, the default is to install the perl modules local to the moab install directory and since it is normal practice to configure the moab upgrade to use a new install directory (configure --prefix), it will generally be necessary to reinstall the perl modules. Exit xtopview when done with this step.

Example 43. make perldeps

default/:/software/moab-5.2.2.s10021 # make perldeps

default/:/software/moab-5.2.2.s10021 # exit


Manually merge any changes from the new XT4 native resource manager interface configuration file

If the upgrade brings in new changes to the config.xt4.pl file, you will need to edit the file and manually merge in the changes from the config.xt4.pl.dist file. One way to discover if new changes have been introduced is to diff the config.xt4.pl.dist from the old and new tools directories. This is rare, but does happen on occasion. One will generally discover quite quickly if necessary changes were not made because the xt4 scripts will usually fail if the config file has not been updated.

Example 44. Merge any updates into the XT4 configuration file

# cd /rr/current/opt/moab/default/tools

# diff config.xt4.pl.dist ../../moab-5.2.2.s10009/tools/config.xt4.pl.dist

# vi config.xt4.pl


Reload the new environment

Example 45. Swapping in the new moab module

# module swap moab/5.2.2.s10021


Startup the Moab Workload Manager

Start up the moab daemon.

Example 46. Start Moab

# /opt/moab/default/sbin/moab

Alternatively, if you installed the init.d script, you may run:

# service moab start