Extract and build the distribution on the machine that will act as the "TORQUE server"
- the machine that will monitor and control all compute nodes by running the pbs_server daemon. See the example below (where XXX stands for the latest distribution (e.g., "-1.2.0p4"):
(OPTIONAL) Set the PATH environment variable. The default installation directories for the binaries are either /usr/local/bin and /usr/local/sbin
See configure options for information on customizing the build at configure time.
In this document $(TORQUECFG) corresponds to where TORQUE stores its configuration files. This defaults to:
TORQUE 2.0p2 and higher includes a standard spec file for building your own rpms. It is also possible to use the checkinstall program to create your own RPM, tgz, or deb package.
1.1.1 Architecture
A TORQUE cluster consists of 1 headnode and many compute nodes. The headnode runs the pbs_server daemon and the compute nodes run the pbs_mom daemon. Client commands for submitting and managing jobs can be installed on any host (including hosts that don’t run pbs_server or pbs_mom.)
The headnode will also run a scheduler daemon. The scheduler interacts with pbs_server to make local policy decisions for resource usage and allocate nodes to jobs. A simple fifo scheduler, and code to construct more advanced schedulers are provided in the TORQUE source distribution, but most sites opt for a packaged advanced scheduler like Maui or Moab.
Users submit jobs to pbs_server> using the qsub command. When pbs_server receives a new job, it informs the scheduler. If and when the scheduler finds nodes for the job, it sends instructions to run the job with the nodelist to pbs_server. pbs_server sends the new job to the first node in the nodelist to launch the job. This node is designated as the “execution host” or “Mother Superior”. Other nodes in a job are called “sister moms.”
1.1.2 Compute Nodes
Several methods are available to install TORQUE on the compute notes. Users with RPM-based Linux distributions can build themselves RPMS directly from the source tarball in 2 ways:
If the defaults are acceptable, simply run rpmbuild -ta torque-xxx.tar.gz.
If special configure flags are required, untar and build as normal, but run make rpms at the end.
All other operating systems are encouraged to use our “tpackage” system which simply creates self-extracting tarballs that can be easily distributed and installed. To create tpackages, simply configure and make as normal, and then run make packages. Copy the tpackages to any other machines and execute them with --install. For example, xCAT users might do prcp torque-package-linux-i686.sh main:/tmp/; psh main /tmp/torque-package-linux-i686.sh --install).
The tpackages are very customizable. See the INSTALL file for additional options and features.
Although optional, it is also possible to use the TORQUE server as a compute node and
install a pbs_mom alongside the pbs_server daemon.
Example: Compute Node Installation
both pbs_iff (mandatory) and pbs_rcp (optional) will be installed suid root.
1.1.3 Upgrading TORQUE
Upgrading TORQUE can generally be done without shutting down the whole cluster and disrupting running jobs.&npsb; Simply build and install the new version and restart the daemons.&npsp; Here is the safest procedure for upgrading TORQUE:
Kill the scheduler.
Wait a few minutes for all new jobs to complete startup.
All running jobs in qstat -a have some elapsed walltime.
Restart pbs_server.
Verify the new pbs_server is working correctly.
nodes should come up (not down or state-unknown)
job walltimes should increase
If upgrading from an earlier 2.1 build, MOMs can automatically restart themselves with:
momctl -q enablemomrestart=1 -h :ALL
Start the scheduler.
If upgrading from 2.0 or earlier,
Restart MOMs on all idle nodes.
Wait a minute, make sure node and job states are updating correctly.
Delete the previous static archive library files:
libattr.a
libcmds.a
liblog.a
libnet.a
libpbs.a
libsite.a
Mark busy nodes offline.
Start the scheduler.
Restart MOMs on offline nodes after their jobs exit.
All external software like maui, perl-PBS, or pbs_python built with the 2.0.x static archives will need to be rebuilt with the newer 2.1.x shared libraries.
1.1.4 Enabling TORQUE/Moab as a Service (OPTIONAL)
An optional startup/shutdown service script is provided as an example of how to run TORQUE as an OS service that starts at bootup.
Download the script here. (NOTE: this script was written
specifically for Redhat variants, and may require modification to work with other Linux/UNIX distributions.)
Place the file in /etc/init.d/ directory
Make symbolic links (S99moab and K15moab, for example) in desired runtimes (e.g. /etc/rc.d/rc3.d/ on Redhat, etc.)
This can be added to the self-extracting packages (See INSTALL for details.)