[torqueusers] Configuring Torque for local-only use on Mac OS X

Jim Prewett download at hpc.unm.edu
Wed Jun 11 08:29:00 MDT 2008


Hi Kevin,

What happens when you 'ping localhost' ?

Do you have an entry for localhost in /etc/hosts?  Personally, I would 
make sure that I had an entry for my hostname in /etc/hosts and use that 
hostname as the server hostname.

Good luck,
Jim

James E. Prewett                    Jim at Prewett.org download at hpc.unm.edu 
Systems Team Leader           LoGS: http://www.hpc.unm.edu/~download/LoGS/ 
Designated Security Officer         OpenPGP key: pub 1024D/31816D93    
HPC Systems Engineer III   UNM HPC  505.277.8210

On Wed, 11 Jun 2008, Kevin Murphy wrote:

> I'm hoping to learn about some of the Torque (2.3.0) features locally on my
> Mac OS X Leopard (10.5.3) laptop, but I haven't been able to get a test job to
> actually run.  Before running pbs_server, pbs_mom, and pbs_sched, I run 'sudo
> hostname localhost' (otherwise, I can't even start these processes).  I am a
> Torque newbie but have been doing my best with the manual and wiki.  The
> fundamental problem seems to be that pbs_server is unable to contact node
> 'localhost'.
> 
> $ qmgr -c 'p s'
> #
> # Create queues and set their attributes.
> #
> #
> # Create and define queue batch
> #
> create queue batch
> set queue batch queue_type = Execution
> set queue batch acl_host_enable = False
> set queue batch acl_hosts = localhost
> set queue batch enabled = True
> set queue batch started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server acl_hosts = localhost
> set server operators = murphy at localhost
> set server operators += root at localhost
> set server operators += username at localhost
> set server default_queue = batch
> set server log_events = 511
> set server mail_from = adm
> set server scheduler_iteration = 600
> set server node_ping_rate = 10
> set server node_check_rate = 10
> set server tcp_timeout = 6
> set server next_job_number = 15
> 
> Here are log snippets showing events before and after a qsub (the dashed line
> is where the qsub occurs):
> 
> pbs_server log:
> 
> 06/10/2008 22:41:09;0004;PBS_Server;Svr;WARNING;ALERT: unable to contact node
> localhost
> - - -
> 06/10/2008 22:51:41;0100;PBS_Server;Job;15.localhost;enqueuing into batch,
> state 1 hop 1
> 06/10/2008 22:51:41;0008;PBS_Server;Job;15.localhost;Job Queued at request of
> murphy at localhost, owner = murphy at localhost, job name = junk.sh, queue = batch
> 06/10/2008 22:51:41;0040;PBS_Server;Svr;localhost;Scheduler sent command new
> 06/10/2008 22:51:41;0008;PBS_Server;Job;15.localhost;Job Modified at request
> of Scheduler at localhost
> 06/10/2008 22:51:41;0008;PBS_Server;Job;15.localhost;Job Run at request of
> Scheduler at localhost
> 06/10/2008 22:51:41;0040;PBS_Server;Svr;localhost;Scheduler sent command recyc
> 06/10/2008 22:51:41;0010;PBS_Server;Job;15.localhost;Exit_status=0
> resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb
> resources_used.walltime=00:00:00
> 06/10/2008 22:51:59;0004;PBS_Server;Svr;check_nodes;node localhost not
> detected in 20 seconds, marking node down
> 06/10/2008 22:52:39;0004;PBS_Server;Svr;check_nodes;node localhost not
> detected in 15 seconds, marking node down
> 
> pbs_mom log:
> 
> 06/10/2008 22:41:06;0002;   pbs_mom;Svr;Log;Log opened
> 06/10/2008 22:41:06;0001;   pbs_mom;Svr;pbs_mom;No such file or directory (2)
> in read_config, fstat: config
> 06/10/2008 22:41:06;0002;   pbs_mom;Svr;setpbsserver;localhost
> 06/10/2008 22:41:06;0002;   pbs_mom;Svr;mom_server_add;server localhost added
> 06/10/2008 22:41:06;0002;   pbs_mom;n/a;initialize;independent
> 06/10/2008 22:41:06;0080;   pbs_mom;Svr;pbs_mom;before init_abort_jobs
> 06/10/2008 22:41:06;0002;   pbs_mom;Svr;pbs_mom;Is up
> 06/10/2008 22:41:06;0002;   pbs_mom;Svr;setup_program_environment;MOM
> executable path and mtime at launch: /usr/local/sbin/pbs_mom 1211235964
> 06/10/2008 22:41:06;0002;   pbs_mom;n/a;mom_server_check_connections;hello
> sent to server localhost
> - - -
> 06/10/2008 22:51:41;0001;   pbs_mom;Job;TMomFinalizeJob3;job 15.localhost
> started, pid = 1359
> 06/10/2008 22:51:41;0080;   pbs_mom;Job;15.localhost;task 1 terminated
> 06/10/2008 22:51:41;0008;   pbs_mom;Job;15.localhost;job was terminated
> 06/10/2008 22:51:41;0080;   pbs_mom;Svr;preobit_reply;top of preobit_reply
> 06/10/2008 22:51:41;0080;
> pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of
> while loop
> 06/10/2008 22:51:41;0080;   pbs_mom;Svr;preobit_reply;in while loop, no error
> from job stat
> 06/10/2008 22:51:41;0080;   pbs_mom;Job;15.localhost;obit sent to server
> 
> pbs_sched log
> 
> 06/10/2008 22:41:12;0002; pbs_sched;Svr;Log;Log opened
> 06/10/2008 22:41:12;0002; pbs_sched;Svr;TokenAct;Account file
> /var/spool/torque/sched_priv/accounting/20080610 opened
> 06/10/2008 22:41:12;0002; pbs_sched;Svr;main;pbs_sched startup pid 1300
> 06/10/2008 22:51:09;0080; pbs_sched;Svr;main;brk point 2625536
> - - -
> 06/10/2008 22:51:41;0040; pbs_sched;Job;15.localhost;Job Run
> 
> 
> $ sudo lsof | grep pbs | grep LISTEN
> pbs_mom   1296           root    5u     IPv4 0x873de64       0t0       TCP
> *:15002 (LISTEN)
> pbs_mom   1296           root    6u     IPv4 0x8732a68       0t0       TCP
> *:15003 (LISTEN)
> pbs_serve 1298           root    6u     IPv4 0x8a48a68       0t0       TCP
> *:15001 (LISTEN)
> pbs_sched 1300           root    4u     IPv4 0x8a49270       0t0       TCP
> localhost:15004 (LISTEN)
> 
> Thanks,
> Kevin Murphy
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 


More information about the torqueusers mailing list