TORQUE Resource Manager

TORQUE Administrator's Manual - Appendix L: TORQUE Quick Start Guide

Appendix L: TORQUE Quick Start Guide

L.1 Initial Installation

  • Download the TORQUE distribution file from http://clusterresources.com/downloads/torque
  • Extract and build the distribution on the machine that will act as the "TORQUE server" - the machine that will monitor and control all compute nodes by running the pbs_server daemon.  See the example below:

    > tar -xzvf torque.tar.gz
    > cd torque
    > ./configure
    > make
    > make install
    

    OSX 10.4 users need to change the #define __TDARWIN in src/include/pbs_config.h to #define __TDARWIN_8.

  • (OPTIONAL) Set the PATH environment variable.  The default installation directories for the binaries are either /usr/local/bin and /usr/local/sbin

    In this document $(TORQUECFG) corresponds to where TORQUE stores its configuration files.  This defaults to /var/spool/torque.

L.2 Initialize/Configure TORQUE on the Server (pbs_server)

  • Once installation on the TORQUE server is complete, configure the pbs_server daemon by executing the command torque.setup <USER> found packaged with the distribution source code, where <USER> is a username that will act as the TORQUE admin. (Click here for instructions to configure manually.) This script will setup a basic batch queue to get you started. If you experience problems, make sure that the most recent TORQUE executables are being executed, or that the executables are in your current PATH.
  • If doing this step manually, be certain to run the command 'pbs_server -t create' to create the new batch database.  If this step is not taken, the pbs_server daemon will be unable to start.
  • Proper server configuration can be verified by following the steps listed in Section 1.4 Testing

L.3 Install TORQUE on the Compute Nodes

To configure a compute node do the following on each machine (see page 19, Section 3.2.1 of PBS Administrator's Manual for full details):
  • Create the self-extracting, distributable packages with make packages (See the INSTALL file for additional options and features of the distributable packages) and use the parallel shell command from your cluster management suite to copy and execute the package on all nodes (ie: xCAT users might do prcp torque-package-linux-i686.sh main:/tmp/; psh main /tmp/torque-package-linux-i686.sh --install. Optionally, distribute and install the clients package.

L.4 Configure TORQUE on the Compute Nodes

  • For each compute host, the MOM daemon must be configured to trust the pbs_server daemon.  In TORQUE 2.0.0p4 and earlier, this is done by creating the $(TORQUECFG)/mom_priv/config file and setting the $pbsserver parameter.  In TORQUE 2.0.0p5 and later, this can also be done by creating the $(TORQUECFG)/server_name file and placing the server hostname inside.
  • Additional config parameters may be added to $(TORQUECFG)/mom_priv/config (See the MOM Config page for details.)

L.5 Configure Data Management on the Compute Nodes

Data management allows jobs' data to be staged in/out or to and from the server and compute nodes.
  • For shared filesystems (i.e., NFS, DFS, AFS, etc.) use the $usecp parameter in the mom_priv/config files to specify how to map a user's home directory.
    (Example: $usecp gridmaster.tmx.com:/home /home)
  • For local, non-shared filesystems, rcp or scp must be configured to allow direct copy without prompting for passwords (key authentication, etc.)

L.6 Update TORQUE Server Configuration

  • On the TORQUE server, append the list of newly configured compute nodes to the $(TORQUECFG)/server_priv/nodes file:

    server_priv/nodes
    computenode001.cluster.org
    computenode002.cluster.org
    computenode003.cluster.org
    

L.7 Start the pbs_mom Daemons on Compute Nodes

  • Next start the pbs_mom daemon on each compute node by running the pbs_mom executable.

L.8 Verifying Correct TORQUE Installation

The pbs_server daemon was started on the TORQUE server when the torque.setup file was executed or when it was manually configured.  It must now be restarted so it can reload the updated configuration changes.

# shutdown server
> qterm # shutdown server

# start server
> pbs_server

# verify all queues are properly configured
> qstat -q

# view additional server configuration
>  qmgr -c 'p s'

# verify all nodes are correctly reporting
> pbsnodes -a 

# submit a basic job
> echo "sleep 30" | qsub

# verify jobs display
> qstat

At this point, the job will not start because there is no scheduler running.  The scheduler is enabled in the next step below.

L.9 Enabling the Scheduler

Selecting the cluster scheduler is an important decision and significantly affects cluster utilization, responsiveness, availability, and intelligence.  The default TORQUE scheduler, pbs_sched, is very basic and will provide poor utilization of your cluster's resources.  Other options, such as Maui Scheduler or Moab Workload Manager are highly recommended.  If using Maui/Moab, refer to the Moab-PBS Integration Guide.   If using pbs_sched, start this daemon now.

If you are installing ClusterSuite, TORQUE and Moab were configured at installation for interoperability and no further action is required.

L.10 Startup/Shutdown Service Script for TORQUE/Moab (OPTIONAL)

An optional startup/shutdown service script is provided as an example of how to run TORQUE as an OS service that starts at bootup.
  • Download the script here. (NOTE: this script was written specifically for Redhat variants, and may require modification to work with other Linux/UNIX distributions.)
  • Place the file in /etc/init.d/ directory
  • Make symbolic links (S99moab and K15moab, for example) in desired runtimes (e.g. /etc/rc.d/rc3.d/ on Redhat, etc.)

See Also: