[torqueusers] Re: unable to contact node, Connection refused

Alejandro Hurtado Turiño ale at cubaenergia.cu
Mon Nov 7 06:57:09 MST 2005


Thank Garrick, for your answer. but I continue with the problem and I 
don't have any ideas on how to solve it. One time I made guide for the 
installation of the torque/PBS server, based in my own(shortest) 
experience and manuals. Now, I'm trying to to do it following this guide.
(I write it at the end) 
My server is a node too, and scheq (is it a problem? )
thank again,
Alejandro 
----- the guide -----
TORQUE

1.-to download the instalation. untar it:
	tar xvfz torque-1.1.0p6.tar.gz
2.-go to dir: cd torque-1.1.Op6
4.-run ./configure without options then => pbs_home will be 
/usr/spool/PBS/
5.-Modify  this file:
> vi buildutils/makedepend-sh
>>>> modify "eval $CPP..." linea 576 of 758 add the line 'grep -v ">$"'
       eval $CPP $arg_cc $d/$s $errout | \
       sed -n -e "s;^\# [0-9][0-9 ]*\"\(.*\)\";$f: \1;p" | \
       grep -v "$s\$" | \	       #maybe this line be lager than this
       grep -v ">$" | \			  #add this
       sed -e 's;\([^ :]*: [^ ]*\).*;\1;' \
       >> $TMP
6.- make
7.- make install
8.- cd doc         	# documentation and mapages
9.- make install	      # or in 4.- ./configure --enable-docs
--------Configuracion server-sched 
>torque.setup root 					
>echo 'grid1'>/usr/spool/PBS/server_priv/nodes	#creando el file nodes
>gmgr -c 'create node worker2'        # and worker3,....
-configurando el startup del daemom
>vi /etc/init.d/pbs
**write the script for startup pbs, mom, scheq

>chkconfig --add pbs
>chkconfig --level 345 pbs on
>qmgr
:s s acl_hosts=*.cubaenergia.cu
:s s acl_host_enable=true
:quit
>service pbs restart

--------Setup  MOM node ------
>mount cpmaster:/..../torque-1.1.0p6 /soft
>cd soft/src/resmon/
>make install
>cd ../cmds
>make install
>cd ../iff
>make install
--------Config in MOM ------
vi /etc/init.d/pbs-mom
**write the script for mom
>chkconfig --add pbs-mom
>chkconfig --level 345 pbs-mom on
>vi {PBSHOME}/mom_priv/config
$clienthost     grid1
$logevent       255
$usecp          *:/share /share
>service pbs start
-----------------------end of guide ---- 
-----Original Message-----

> Message: 2
> Date: Fri, 4 Nov 2005 10:55:00 -0800
> From: Garrick Staples <garrick at usc.edu>
> Subject: Re: [torqueusers] unable to contact node, Connection refused
> To: torqueusers at supercluster.org
> Message-ID: <20051104185500.GU14266 at polop.usc.edu>
> Content-Type: text/plain; charset="us-ascii"
> 
> On Fri, Nov 04, 2005 at 08:50:08AM -0500, Alejandro Hurtado Turi?o
> alleged:
> > Hi,
> > I've installed a torque-1.1.0p6 on a cluster, but the jobs don't run 
> > unless forced w/ qrun I'm not planning on installing Maui
> > and just using the default fifo scheduler (pbs_sched)
> 
> 1.1.0p6 is really old.  There have been countless improvements since
> then.
> 
> 
> > The pbs server log say at start up: 
> > 10/31/2005 13:54:28;0006;PBS_Server;Svr;PBS_Server;Using ports
> Server:
> > 15001  Scheduler:15004  MOM:15002
> > 10/31/2005 13:54:28;0002;PBS_Server;Svr;PBS_Server;Server Ready, pid
> = 
> > 2317
> > 10/31/2005 13:54:28;0004;PBS_Server;Svr;WARNING;!!! unable to contact
> node 
> > grid1 !!!
> > 10/31/2005 13:54:28;0001;PBS_Server;Svr;PBS_Server;Connection refused
> > (111) in contact_sched, Could not contact Scheduler - port 15004
> 
> Is pbs_mom running on grid1?  Is pbs_sched running on the server?
> 
> > ---
> > grid1 is the pbs server with pbsmon installed.
> > no firewall
> > my mom-priv/config
> > $clienthost    grid1
> > $logevent      255
> > $restricted    grid1
> > $usecp         *:/data /data
> 
> Is grid1 a node or server?  The information above is confusing.
> 
> The server logs indicate that it is a node.  The MOM config looks like
> grid1 is the server.
> 
> And you don't need the $restricted line, that just weakens security.
> 
> 
> > Looking for it in the web, i see the problem is common but notbody
> answer 
> > it.
> > could anybody helpme please!??
> > thanks 
> > ale
> 
> These kinds of things are just config errors that are hard to diagnose
> over email.  Eventually the admin figures it out and doesn't tell
> anyone
> :)
> 
> -- 
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 189 bytes
> Desc: not available
> Url :
> http://www.supercluster.org/pipermail/torqueusers/attachments/20051104/
> bababcab/attachment-0001.bin
> 
> ------------------------------
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 
> End of torqueusers Digest, Vol 16, Issue 5
> ******************************************




More information about the torqueusers mailing list