TORQUE Quick Start Guide

TORQUE Quick Start Guide

1.0 Initial Installation

  • Download the TORQUE distribution file from http://clusterresources.com/downloads/torque
  • Extract and build the distribution on the machine that will act as the "TORQUE server" - the machine that will monitor and control all compute nodes by running the pbs_server daemon.  See the example below:

    > tar -xzvf torque.tar.gz
    > cd torque
    > ./configure
    > make
    > make install
    
  • (OPTIONAL) Set the PATH environment variable.  The default installation directories for the binaries are either /usr/local/bin and /usr/local/sbin

NOTE: In this document $(TORQUECFG) corresponds to where TORQUE stores its configuration files.  This defaults to /usr/spool/PBS.

2.0 Initialize/Configure TORQUE on the Server (pbs_server)

  • Once installation on the TORQUE server is complete, configure the pbs_server daemon by executing the command torque.setup <USER> found packaged with the distribution source code, where <USER> is a username that will act as the TORQUE admin. (Click here for instructions to configure manually.) This script will setup a basic batch queue to get you started. If you experience problems, make sure that the most recent TORQUE executables are being executed, or that the executables are in your current PATH.
  • Proper server configuration can be verified by following the steps listed here

3.0 Install TORQUE on the Compute Nodes

To configure a compute node do the following on each machine (see page 19, Section 3.2.1 of PBS Administrator's Manual for full details):
  • Install the pbs_mom daemon on each compute node (NOTE: Although optional, it is also possible to use the TORQUE server as a compute node and install a pbs_mom alongside the pbs_server daemon.) Installation of the pbs_mom daemons can be done by either building and installing the distribution as done with the TORQUE server, or to simplify the installation, can use the TORQUE server's source and binaries. To use the TORQUE server's already compiled binaries you will need access to the server's TORQUE source tree (via NFS mount, etc.) from the compute node. To install the pbs_mom daemon (required for all compute nodes) enter the src/resmom directory on the TORQUE server's filesystem and run make install as root to install the pbs_mom on the compute node. Optionally you can also install the client commands (pbsnodes, etc.) by entering the src/cmds directory and running make install as root.

4.0 Configure TORQUE on the Compute Nodes

  • Begin by editing the $(TORQUECFG)/mom_priv/config file on each node.  Recommended settings are as follows:

    mom_priv/config
    $clienthost     10.10.10.100                      # note: IP address of host running pbs_server
    $logevent       255
    $restricted 10.10.10.100 # note: IP address of host running pbs_server

5.0 Configure Data Management on the Compute Nodes

Data management allows jobs' data to be staged in/out or to and from the server and compute nodes.
  • For shared filesystems (i.e., NFS, DFS, AFS, etc.) use the $usecp parameter in the mom_priv/config files to specify how to map a user's home directory.
    (Example: $usecp gridmaster.tmx.com:/home /home)
  • For local, non-shared filesystems, rcp or scp must be configured to allow direct copy without prompting for passwords (key authentication, etc.)

6.0 Update TORQUE Server Configuration

  • On the TORQUE server, append the list of newly configured compute nodes to the $(TORQUECFG)/server_priv/nodes file:

    server_priv/nodes
    computenode001.cluster.org
    computenode002.cluster.org
    computenode003.cluster.org
    

7.0 Start the pbs_mom Daemons on Compute Nodes

  • Next start the pbs_mom daemon on each compute node by running the pbs_mom executable.

8.0 Verifying Correct TORQUE Installation

The pbs_server daemon was started on the TORQUE server when the torque.setup file was executed or when it was manually configured.  It must now be restarted so it can reload the updated configuration changes.

# shutdown server
> qterm -t quick # shutdown server

# start server
> pbs_server

# verify all queues are properly configured
> qstat -q

# verify all nodes are correctly reporting
> pbsnodes -a 

# submit a basic job
> echo "sleep 30" | qsub

# verify jobs display
> qstat

At this point, the job will not start because there is no scheduler running.  The scheduler is enabled in the next step below.

9.0 Enabling the Scheduler

Selecting the cluster scheduler is an important decision and significantly affects cluster utilization, responsiveness, availability, and intelligence.  The default TORQUE scheduler, pbs_sched, is very basic and will provide poor utilization of your cluster's resources.  Other options, such as Maui Scheduler or Moab Workload Manager are highly recommended.  If using Maui/Moab, refer to the Moab-PBS Integration Guide.   If using pbs_sched, start this daemon now.

NOTE: If you are installing ClusterSuite, TORQUE and Moab were configured at installation for interoperability and no further action is required.

10.0 Startup/Shutdown Service Script for TORQUE/Moab (OPTIONAL)

An optional startup/shutdown service script is provided as an example of how to run TORQUE as an OS service that starts at bootup.
  • Download the script here. (NOTE: this script was written specifically for Redhat variants, and may require modification to work with other Linux/UNIX distributions.)
  • Place the file in /etc/init.d/ directory
  • Make symbollic links (S99moab and K15moab, for example) in desired runtimes (e.g. /etc/rc.d/rc3.d/ on Redhat, etc.)