[torqueusers] Configuring Torque for local-only use on Mac OS X
Kevin Murphy
murphy at genome.chop.edu
Wed Jun 11 07:43:43 MDT 2008
I'm hoping to learn about some of the Torque (2.3.0) features locally on
my Mac OS X Leopard (10.5.3) laptop, but I haven't been able to get a
test job to actually run. Before running pbs_server, pbs_mom, and
pbs_sched, I run 'sudo hostname localhost' (otherwise, I can't even
start these processes). I am a Torque newbie but have been doing my
best with the manual and wiki. The fundamental problem seems to be that
pbs_server is unable to contact node 'localhost'.
$ qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch acl_host_enable = False
set queue batch acl_hosts = localhost
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = localhost
set server operators = murphy at localhost
set server operators += root at localhost
set server operators += username at localhost
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_ping_rate = 10
set server node_check_rate = 10
set server tcp_timeout = 6
set server next_job_number = 15
Here are log snippets showing events before and after a qsub (the dashed
line is where the qsub occurs):
pbs_server log:
06/10/2008 22:41:09;0004;PBS_Server;Svr;WARNING;ALERT: unable to contact
node localhost
- - -
06/10/2008 22:51:41;0100;PBS_Server;Job;15.localhost;enqueuing into
batch, state 1 hop 1
06/10/2008 22:51:41;0008;PBS_Server;Job;15.localhost;Job Queued at
request of murphy at localhost, owner = murphy at localhost, job name =
junk.sh, queue = batch
06/10/2008 22:51:41;0040;PBS_Server;Svr;localhost;Scheduler sent command new
06/10/2008 22:51:41;0008;PBS_Server;Job;15.localhost;Job Modified at
request of Scheduler at localhost
06/10/2008 22:51:41;0008;PBS_Server;Job;15.localhost;Job Run at request
of Scheduler at localhost
06/10/2008 22:51:41;0040;PBS_Server;Svr;localhost;Scheduler sent command
recyc
06/10/2008 22:51:41;0010;PBS_Server;Job;15.localhost;Exit_status=0
resources_used.cput=00:00:00 resources_used.mem=0kb
resources_used.vmem=0kb resources_used.walltime=00:00:00
06/10/2008 22:51:59;0004;PBS_Server;Svr;check_nodes;node localhost not
detected in 20 seconds, marking node down
06/10/2008 22:52:39;0004;PBS_Server;Svr;check_nodes;node localhost not
detected in 15 seconds, marking node down
pbs_mom log:
06/10/2008 22:41:06;0002; pbs_mom;Svr;Log;Log opened
06/10/2008 22:41:06;0001; pbs_mom;Svr;pbs_mom;No such file or
directory (2) in read_config, fstat: config
06/10/2008 22:41:06;0002; pbs_mom;Svr;setpbsserver;localhost
06/10/2008 22:41:06;0002; pbs_mom;Svr;mom_server_add;server localhost
added
06/10/2008 22:41:06;0002; pbs_mom;n/a;initialize;independent
06/10/2008 22:41:06;0080; pbs_mom;Svr;pbs_mom;before init_abort_jobs
06/10/2008 22:41:06;0002; pbs_mom;Svr;pbs_mom;Is up
06/10/2008 22:41:06;0002; pbs_mom;Svr;setup_program_environment;MOM
executable path and mtime at launch: /usr/local/sbin/pbs_mom 1211235964
06/10/2008 22:41:06;0002;
pbs_mom;n/a;mom_server_check_connections;hello sent to server localhost
- - -
06/10/2008 22:51:41;0001; pbs_mom;Job;TMomFinalizeJob3;job
15.localhost started, pid = 1359
06/10/2008 22:51:41;0080; pbs_mom;Job;15.localhost;task 1 terminated
06/10/2008 22:51:41;0008; pbs_mom;Job;15.localhost;job was terminated
06/10/2008 22:51:41;0080; pbs_mom;Svr;preobit_reply;top of preobit_reply
06/10/2008 22:51:41;0080;
pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top
of while loop
06/10/2008 22:51:41;0080; pbs_mom;Svr;preobit_reply;in while loop, no
error from job stat
06/10/2008 22:51:41;0080; pbs_mom;Job;15.localhost;obit sent to server
pbs_sched log
06/10/2008 22:41:12;0002; pbs_sched;Svr;Log;Log opened
06/10/2008 22:41:12;0002; pbs_sched;Svr;TokenAct;Account file
/var/spool/torque/sched_priv/accounting/20080610 opened
06/10/2008 22:41:12;0002; pbs_sched;Svr;main;pbs_sched startup pid 1300
06/10/2008 22:51:09;0080; pbs_sched;Svr;main;brk point 2625536
- - -
06/10/2008 22:51:41;0040; pbs_sched;Job;15.localhost;Job Run
$ sudo lsof | grep pbs | grep LISTEN
pbs_mom 1296 root 5u IPv4 0x873de64 0t0
TCP *:15002 (LISTEN)
pbs_mom 1296 root 6u IPv4 0x8732a68 0t0
TCP *:15003 (LISTEN)
pbs_serve 1298 root 6u IPv4 0x8a48a68 0t0
TCP *:15001 (LISTEN)
pbs_sched 1300 root 4u IPv4 0x8a49270 0t0
TCP localhost:15004 (LISTEN)
Thanks,
Kevin Murphy
More information about the torqueusers
mailing list