[torqueusers] Configuring Torque for local-only use on Mac OS X

Kevin Murphy murphy at genome.chop.edu
Wed Jun 11 07:43:43 MDT 2008


I'm hoping to learn about some of the Torque (2.3.0) features locally on 
my Mac OS X Leopard (10.5.3) laptop, but I haven't been able to get a 
test job to actually run.  Before running pbs_server, pbs_mom, and 
pbs_sched, I run 'sudo hostname localhost' (otherwise, I can't even 
start these processes).  I am a Torque newbie but have been doing my 
best with the manual and wiki.  The fundamental problem seems to be that 
pbs_server is unable to contact node 'localhost'.

$ qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch acl_host_enable = False
set queue batch acl_hosts = localhost
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = localhost
set server operators = murphy at localhost
set server operators += root at localhost
set server operators += username at localhost
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_ping_rate = 10
set server node_check_rate = 10
set server tcp_timeout = 6
set server next_job_number = 15

Here are log snippets showing events before and after a qsub (the dashed 
line is where the qsub occurs):

pbs_server log:

06/10/2008 22:41:09;0004;PBS_Server;Svr;WARNING;ALERT: unable to contact 
node localhost
- - -
06/10/2008 22:51:41;0100;PBS_Server;Job;15.localhost;enqueuing into 
batch, state 1 hop 1
06/10/2008 22:51:41;0008;PBS_Server;Job;15.localhost;Job Queued at 
request of murphy at localhost, owner = murphy at localhost, job name = 
junk.sh, queue = batch
06/10/2008 22:51:41;0040;PBS_Server;Svr;localhost;Scheduler sent command new
06/10/2008 22:51:41;0008;PBS_Server;Job;15.localhost;Job Modified at 
request of Scheduler at localhost
06/10/2008 22:51:41;0008;PBS_Server;Job;15.localhost;Job Run at request 
of Scheduler at localhost
06/10/2008 22:51:41;0040;PBS_Server;Svr;localhost;Scheduler sent command 
recyc
06/10/2008 22:51:41;0010;PBS_Server;Job;15.localhost;Exit_status=0 
resources_used.cput=00:00:00 resources_used.mem=0kb 
resources_used.vmem=0kb resources_used.walltime=00:00:00
06/10/2008 22:51:59;0004;PBS_Server;Svr;check_nodes;node localhost not 
detected in 20 seconds, marking node down
06/10/2008 22:52:39;0004;PBS_Server;Svr;check_nodes;node localhost not 
detected in 15 seconds, marking node down

pbs_mom log:

06/10/2008 22:41:06;0002;   pbs_mom;Svr;Log;Log opened
06/10/2008 22:41:06;0001;   pbs_mom;Svr;pbs_mom;No such file or 
directory (2) in read_config, fstat: config
06/10/2008 22:41:06;0002;   pbs_mom;Svr;setpbsserver;localhost
06/10/2008 22:41:06;0002;   pbs_mom;Svr;mom_server_add;server localhost 
added
06/10/2008 22:41:06;0002;   pbs_mom;n/a;initialize;independent
06/10/2008 22:41:06;0080;   pbs_mom;Svr;pbs_mom;before init_abort_jobs
06/10/2008 22:41:06;0002;   pbs_mom;Svr;pbs_mom;Is up
06/10/2008 22:41:06;0002;   pbs_mom;Svr;setup_program_environment;MOM 
executable path and mtime at launch: /usr/local/sbin/pbs_mom 1211235964
06/10/2008 22:41:06;0002;   
pbs_mom;n/a;mom_server_check_connections;hello sent to server localhost
- - -
06/10/2008 22:51:41;0001;   pbs_mom;Job;TMomFinalizeJob3;job 
15.localhost started, pid = 1359
06/10/2008 22:51:41;0080;   pbs_mom;Job;15.localhost;task 1 terminated
06/10/2008 22:51:41;0008;   pbs_mom;Job;15.localhost;job was terminated
06/10/2008 22:51:41;0080;   pbs_mom;Svr;preobit_reply;top of preobit_reply
06/10/2008 22:51:41;0080;   
pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top 
of while loop
06/10/2008 22:51:41;0080;   pbs_mom;Svr;preobit_reply;in while loop, no 
error from job stat
06/10/2008 22:51:41;0080;   pbs_mom;Job;15.localhost;obit sent to server

pbs_sched log

06/10/2008 22:41:12;0002; pbs_sched;Svr;Log;Log opened
06/10/2008 22:41:12;0002; pbs_sched;Svr;TokenAct;Account file 
/var/spool/torque/sched_priv/accounting/20080610 opened
06/10/2008 22:41:12;0002; pbs_sched;Svr;main;pbs_sched startup pid 1300
06/10/2008 22:51:09;0080; pbs_sched;Svr;main;brk point 2625536
- - -
06/10/2008 22:51:41;0040; pbs_sched;Job;15.localhost;Job Run


$ sudo lsof | grep pbs | grep LISTEN
pbs_mom   1296           root    5u     IPv4 0x873de64       0t0       
TCP *:15002 (LISTEN)
pbs_mom   1296           root    6u     IPv4 0x8732a68       0t0       
TCP *:15003 (LISTEN)
pbs_serve 1298           root    6u     IPv4 0x8a48a68       0t0       
TCP *:15001 (LISTEN)
pbs_sched 1300           root    4u     IPv4 0x8a49270       0t0       
TCP localhost:15004 (LISTEN)

Thanks,
Kevin Murphy



More information about the torqueusers mailing list