[torqueusers] Torque Jobs Stay Queued

Ablen ablenzo at hotmail.com
Tue Oct 9 08:19:47 MDT 2012


Hello friends,

I am working to install Torque on a FC16 Linux Cluster.  So far I have added
only what I need for it to run on the master node - and I think I have done
everything correctly.  When I submit a job, however, it shows up as being in the
queued state - and won't run.  I think there must be a minor step I've missed. 
Below are the steps I've taken to set up torque as well as the sample job I am
trying.  Could someone please let me know what I may still need to do in order
for this job to run?  All comments appreciated.

Many thanks.
ablen

1 – Log into server as root
2 – Edit the /etc/hosts file and change the first line so it looks like this:

127.0.0.1 mysrv  localhost.localdomain localhost

3 – yum install openssl-devel
3 - yum install libxml2-devel
4 – yum –y install ‘torque*’
5 - pbs_server –t create
6 - systemctl start pbs_{mom,server,sched}.service
7 - systemctl enable pbs_{mom,server,sched}.service
8 -  /usr/local/sbin/trqauthd start
9 – pbs_server
10 – vi /var/spool/torque/server_name and also vi /etc/torque/server_name

change server name to mysrv if needed

11 – vi /var/spool/torque/mom_priv/config and vi /etc/torque/mom/config
add/modify this line:

$pbsserver mysrv

12 – vi /var/spool/torque/server_priv/nodes   (create this file) and add all
nodes in the cluster like this (np for number of processors – VERIFY THAT THESE
ARE 4 processors ea).

mysrv np=4
node2 np=4
node3 np=4
…

13 - vi /etc/sysconfig/network (and make sure that HOSTNAME is set as follows):

HOSTNAME=mysrv

14 - Append these lines to the /etc/profile file (for bash)
PBS_DEFAULT=mysrv
export PBS_DEFAULT

Append these lines to the /etc/bashrc file (also for bash)
PBS_DEFAULT=mysrv
export PBS_DEFAULT

15 – execute all of the following commands:

qmgr -c "set server operators += root at mysrv"
qmgr -c "set server managers += root at mysrv"
qmgr -c 'create queue batch'
qmgr -c 'set queue batch queue_type = execution'
qmgr -c 'set queue batch started = true'
qmgr -c 'set queue batch enabled = true'
qmgr -c 'set queue batch resources_default.walltime = 480:00:00'
qmgr -c 'set queue batch resources_default.nodes = 1'
qmgr -c 'set queue batch max_running = 1000'
qmgr -c 'set server default_queue = batch'

16 – Log into a non-root account and run these commands as a preliminary test:

qmgr -c "list server"
qmgr -c "list queue batch"

17 – Submit  a test job from the nonroot account, then view it using qstat:

echo "sleep 30" | qsub
qstat

Results look like this:

[mine at mysrv ~]$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
0.mysrv                    STDIN            mine                   0 Q batch   
      
1.mysrv                    STDIN            mine                   0 Q batch   
      
[mine at mysrv ~]$



More information about the torqueusers mailing list