TORQUE Resource Manager

2.1 Job Submission

  • 2.1.1 Multiple Jobs Submission
  • 2.1.2 Requesting Resources
  • 2.1.3 Requesting Generic Resources
  • 2.1.4 Requesting Floating Resources
  • 2.1.5 Requesting Other Resources
  • 2.1.6 Exported Batch Environment Variables
  • 2.1.7 Enabling Trusted Submit Hosts

Job submission is accomplished using the qsub command, which takes a number of command line arguments and integrates such into the specified PBS command file. The PBS command file may be specified as a filename on the qsub command line or may be entered via STDIN.

  • The PBS command file does not need to be executable.
  • The PBS command file may be piped into qsub (i.e., cat pbs.cmd | qsub)
  • In the case of parallel jobs, the PBS command file is staged to, and executed on, the first allocated compute node only. (Use pbsdsh to run actions on multiple nodes.)
  • The command script is executed from the user's home directory in all cases. (The script may determine the submission directory by using the $PBS_O_WORKDIR environment variable)
  • The command script will be executed using the default set of user environment variables unless the -V or -v flags are specified to include aspects of the job submission environment.

By default, job submission is allowed only on the TORQUE server host (host on which pbs_server is running). Enablement of job submission from other hosts is documented in Configuring Job Submit Hosts.

2.1.1 Multiple Jobs Submission

Sometimes users will want to submit large numbers of jobs based on the same job script. Rather than using a script to repeatedly call qsub, a feature known as job arrays now exists to allow the creation of multiple jobs with one qsub command. Additionally, this feature includes a new job naming convention that allows users to reference the entire set of jobs as a unit, or to reference one particular job from the set.

Example
qsub -t 0-4 job_script
1098.hostname

qstat
1098-0.hostname ...
1098-1.hostname ...
1098-2.hostname ...
1098-3.hostname ...
1098-4.hostname ...

Job arrays are submitted through the -t option to qsub, or by using #PBS -t in your batch script. This option takes a comma-separated list consisting of either a single job ID number, or a pair of numbers separated by a dash. Each of these jobs created will use the same script and will be running in a nearly identical environment.

Versions of TORQUE earlier than 2.3 had different semantics for the -t arguement. In these versions, -t took a single integer number—a count of the number of jobs to be created.

Syntax Example
qsub -t 0-99 would be equvalent to qsub -t 100 in torque 2.2

you can also pass comma delimited lists of ids and ranges:

qsub -t 0,10,20,30,40

or

qsub -t 0-50,60,70,80

Each 1098-x job has an environment variable called PBS_ARRAYID, which is set to the value of the array index of the job, so 1098-0.hostname would have PBS_ARRAYID set to 0. This will allow you to create job arrays where each job in the array will perform slightly different actions based on the value of this variable, such as performing the same tasks on different input files. One other difference in the environment between jobs in the same array is the value of the PBS_JOBNAME variable.

Currently, each job in the array shows up when qstat is run. Essentially they are fully independent TORQUE jobs. All normal TORQUE commands will work on the individual jobs. Eventually, as the job arrays are further developed, a single entry in qstat would be displayed which would summarize the job. An additional flag for qstat would be provided that would show the details of an array. Currently the only TORQUE command that operates on an array as a whole is the qdel command. In the previous example, qdel 1098 would delete every job in the array, while qdel 1098-0 would delete just that one job. Support for qhold and qrls on an entire array will be available shortly, and array awareness will be added to all TORQUE commands one at a time.

Please be aware that job arrays are under development and may have bugs, and certainly are not feature complete. If you have any suggestions or bug reports, please bring them to our attention.

We are currently aware of one limitation that causes the creation of large job arrays on TORQUE installations with high job IDs. This is due to a historical file name length limitation. With thousands of similarly named jobs (and for the time being, each of these jobs has its own job file within pbs_server—they do all share a copy of the script file while on the server), the file name hashing algorithm that attempts to create unique short file names encounters limitations. This limitation for job arrays or sharing the job file will eventually be removed.

2.1.2 Requesting Resources

Various resources can be requested at the time of job submission. A job can request a particular node, a particular node attribute, or even a number of nodes with particular attributes. Either native TORQUE resources, or external scheduler resource extensions may be specified. The native TORQUE resources are listed in the following table:

Resource Format Description
arch string Specifies the administrator defined system architecture required. This defaults to whatever the PBS_MACH string is set to in "local.mk".
cput seconds, or [[HH:]MM:]SS Maximum amount of CPU time used by all processes in the job.
file size* The amount of total disk requested for the job. (Ignored on Unicos.)
host string Name of the host on which the job should be run. This resource is provided for use by the site's scheduling policy. The allowable values and effect on job placement is site dependent.
mem size* Maximum amount of physical memory used by the job. (Ignored on Darwin, Digital Unix, Free BSD, HPUX 11, IRIX, NetBSD, and SunOS. Also ignored on Linux if number of nodes is not 1. Not implemented on AIX and HPUX 10.)
nice integer Number between -20 (highest priority) and 19 (lowest priority). Adjust the process execution priority.
nodes {<node_count> | <hostname>} [:ppn=<ppn>][:<property>[:<property>]...] [+ ...] Number and/or type of nodes to be reserved for exclusive use by the job. The value is one or more node_specs joined with the + (plus) character: node_spec[+node_spec...]. Each node_spec is a number of nodes required of the type declared in the node_spec and a name of one or more properties desired for the nodes. The number, the name, and each property in the node_spec are separated by a : (colon). If no number is specified, one (1) is assumed.

The name of a node is its hostname. The properties of nodes are:

  • ppn=# - specify the number of processors per node requested. Defaults to 1.
  • property - a string assigned by the system administrator specifying a node's features. Check with your administrator as to the node names and properties available to you.
See Example 1 (-l nodes) for examples.

NOTE: By default, the node resource is mapped to a virtual node (that is, directly to a processor, not a full physical compute node). This behavior can be changed within Maui or Moab by setting the JOBNODEMATCHPOLICY parameter. (See Appendix F of the Moab Workload Manager Administrator's Guide for more information.)

opsys string Specifies the administrator defined operating system as defined in the mom configuration file.
other string Allows a user to specify site specific information. This resource is provided for use by the site's scheduling policy. The allowable values and effect on job placement is site dependent.
pcput seconds, or [[HH:]MM:]SS Maximum amount of CPU time used by any single process in the job.
pmem size* Maximum amount of physical memory used by any single process of the job. (Ignored on Fujitsu. Not implemented on Digital Unix and HPUX.)
pvmem size* Maximum amount of virtual memory used by any single process in the job. (Ignored on Unicos.)
software string Allows a user to specify software required by the job. This is useful if certain software packages are only available on certain systems in the site. This resource is provided for use by the site's scheduling policy. The allowable values and effect on job placement is site dependent. (See Scheduler License Management in the Moab Workload Manager Administrator's Guide for more information.)
vmem size* Maximum amount of virtual memory used by all concurrent processes in the job. (Ignored on Unicos.)
walltime seconds, or [[HH:]MM:]SS Maximum amount of real time during which the job can be in the running state.


*size format:
The size format specifies the maximum amount in terms of bytes or words. It is expressed in the form integer[suffix]. The suffix is a multiplier defined in the following table ('b' means bytes (the default) and 'w' means words). The size of a word is calculated on the execution server as its word size.

 

Suffix Multiplier
b w 1
kb kw 1024
mb mw 1,048,576
gb gw 1,073,741,824
tb tw 1,099,511,627,776

Example 1 (-l nodes)

Usage Description
qsub -l
> qsub -l nodes=12
request 12 nodes of any type
qsub -l
> qsub -l nodes=2:server+14
request 2 "server" nodes and 14 other nodes (a total of 16) - this specifies two node_specs, "2:server" and "14"
qsub -l
> qsub -l nodes=server:hippi+10:noserver+3:bigmem:hippi
request (a) 1 node that is a "server" and has a "hippi" interface, (b) 10 nodes that are not servers, and (c) 3 nodes that have a large amount of memory an have hippi
qsub -l
> qsub -l nodes=b2005+b1803+b1813
request 3 specific nodes by hostname
qsub -l
> qsub -l nodes=4:ppn=2
request 2 processors on each of four nodes
qsub -l
> qsub -l nodes=1:ppn=4
request 4 processors on one node
qsub -l
> qsub -l nodes=2:blue:ppn=2+red:ppn=3+b1014
request 2 processors on each of two blue nodes, three processors on one red node, and the compute node "b1014"

Example 2

> qsub -l mem=200mb /home/user/script.sh
This job requests a node with 200 MB of available memory.

Example 3

> qsub -l nodes=node01,mem=200mb /home/user/script.sh
This job will wait until node01 is free with 200 MB of available memory.

2.1.3 Requesting Generic Resources

When generic resources have been assigned to nodes using the server's nodes file, these resources can be requested at the time of job submission using the other field. (See the Consumable Generic Resources page in the Moab Workload Manager Administrator's Guide for details on configuration within Moab).

Example 1

> qsub -l other=matlab /home/user/script.sh
This job will run on any node that has the generic resource matlab.
 
NOTE: This can also be requested at the time of job submission using the -W x=GRES:matlab flag.

2.1.4 Requesting Floating Resources

When floating resources have been set up inside Moab, they can be requested in the same way as generic resources. Moab will automatically understand that these resources are floating and will schedule the job accordingly. (See the Floating Generic Resources page in the Moab Workload Manager Administrator's Guide for details on configuration within Moab.)

Example 2

> qsub -l other=matlab /home/user/script.sh
This job will run on any node when there are enough floating resources available.
 
NOTE: This can also be requested at the time of job submission using the -W x=GRES:matlab flag.

2.1.5 Requesting Other Resources

Many other resources can be requested at the time of job submission using the Moab Workload Manger. See the Resource Manager Extensions page in the Moab Workload Manager Administrator's Guide for a list of these supported requests and correct syntax.

2.1.6 Exported Batch Environment Variables

When a batch job is started, a number of variables are introduced into the job's environment that can be used by the batch script in making decisions, creating output files, and so forth. These variables are listed in the following table:

Variable Description
PBS_JOBNAME user specified jobname
PBS_ARRAYID zero-based value of job array index for this job (in version 2.2.0 and later)
PBS_O_WORKDIR job's submission directory
PBS_ENVIRONMENT N/A
PBS_TASKNUM number of tasks requested
PBS_O_HOME home directory of submitting user
PBS_MOMPORT active port for mom daemon
PBS_O_LOGNAME name of submitting user
PBS_O_LANG language variable for job
PBS_JOBCOOKIE job cookie
PBS_NODENUM node offset number
PBS_O_SHELL script shell
PBS_O_JOBID unique pbs job id
PBS_O_HOST host on which job script is currently running
PBS_QUEUE job queue
PBS_NODEFILE file containing line delimited list on nodes allocated to the job
PBS_O_PATH path variable used to locate executables within job script

2.1.7 Enabling Trusted Submit Hosts

By default, only the node running the pbs_server daemon is allowed to submit jobs. Additional nodes can be trusted as submit hosts by taking any of the following steps:

  • Set the allow_node_submit server parameter.
    • Allows any host trusted as a compute host to also be trusted as a submit host.
  • Set the submit_hosts server parameter (comma-delimited).
    • Allows specified hosts to be trusted as a submit host.
  • Use .rhosts to enable ruserok() based authentication.

See Job Submission Host Advanced Config for more information.

NOTE: If allow_node_submit is set, the parameter allow_proxy_user must be set to allow user proxying when submitting/running jobs.

See Also

  • Maui Documentation
  • qsub wrapper - Allow local checking and modification of submitted jobs