[torqueusers] Problems with Torque configuration

Davi Vercillo davivercillo.aux at gmail.com
Fri Nov 23 11:42:51 MST 2007


Hi all,

I'm having some troubles to configure my lab to run Torque. I fallowed the
Wiki instruction but it dosn't help me.

When I try to submit a simple job, like this:

#!/bin/bash
#PBS -N testaCluster
#PBS -o /home/temp/cluster.out
#PBS -e /home/temp/cluster.err

echo "It works !!"

I use the command: tracejob <number> and it return:

/var/spool/torque/mom_logs/20071123: No such file or directory
/var/spool/torque/sched_logs/20071123: No matching job records located

Job: 23.bangu00.dcc.ufrj.br

11/23/2007 16:14:25  S    enqueuing into batch, state 1 hop 1
11/23/2007 16:14:25  S    Job Queued at request of
                          davivercillo at bangu00.dcc.ufrj.br, owner =
                          davivercillo at bangu00.dcc.ufrj.br, job name =
                          testaCluster, queue = batch
11/23/2007 16:14:25  A    queue=batch
11/23/2007 16:14:37  S    Job Modified at request of
                          Scheduler at bangu00.dcc.ufrj.br
11/23/2007 16:14:37  S    Job Run at request of
Scheduler at bangu00.dcc.ufrj.br
11/23/2007 16:14:39  S    unable to run job, MOM rejected/rc=2
11/23/2007 16:14:39  S    Job Modified at request of
                          Scheduler at bangu00.dcc.ufrj.br
11/23/2007 16:24:51  S    Job Modified at request of
                          Scheduler at bangu00.dcc.ufrj.br
11/23/2007 16:24:51  S    Job Run at request of
Scheduler at bangu00.dcc.ufrj.br
11/23/2007 16:24:53  S    unable to run job, MOM rejected/rc=2
11/23/2007 16:24:53  S    Job Modified at request of
                          Scheduler at bangu00.dcc.ufrj.br
11/23/2007 16:35:05  S    Job Modified at request of
                          Scheduler at bangu00.dcc.ufrj.br
11/23/2007 16:35:05  S    Job Run at request of
Scheduler at bangu00.dcc.ufrj.br
11/23/2007 16:35:07  S    unable to run job, MOM rejected/rc=2
11/23/2007 16:35:07  S    Job Modified at request of
                          Scheduler at bangu00.dcc.ufrj.br

My configuration files are:

HeadNode ( hostname = bangu00 ):

/var/spool/torque/server_name
bangu00

/var/spool/torque/server_priv/nodes
bangu01 np=1
bangu02 np=1
bangu03 np=1
bangu04 np=1
bangu05 np=1
bangu06 np=1
bangu07 np=1
bangu08 np=1
bangu09 np=1
bangu10 np=1
bangu11 np=1
bangu12 np=1
bangu13 np=1
bangu15 np=1
bangu16 np=1
bangu17 np=1
bangu18 np=1
bangu19 np=1
bangu20 np=1
bangu21 np=1
bangu22 np=1

#qmgr -c "print server"
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch acl_host_enable = True
set queue batch keep_completed = 10
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server managers = davivercillo at bangu00.dcc.ufrj.br
set server managers += root at bangu00.dcc.ufrj.br
set server operators = davivercillo at bangu00.dcc.ufrj.br
set server operators += root at bangu00.dcc.ufrj.br
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server resources_default.nodes = 1
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server pbs_version = 2.1.9

Nodes (hostnames = bangu01 ~ bangu22):

#cat /var/spool/torque/server_name
bangu00

#cat /var/spool/torque/pbs_environment
PATH=/bin:/usr/bin
LANG=pt_BR.UTF-8

#cat /var/spool/torque/mom_priv/config
$pbsserver      bangu00
$logevent       255
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20071123/1c6518a3/attachment.html


More information about the torqueusers mailing list