[torqueusers] Re: unable to contact node, Connection refused
Alejandro Hurtado Turiño
ale at cubaenergia.cu
Mon Nov 7 06:57:09 MST 2005
Thank Garrick, for your answer. but I continue with the problem and I
don't have any ideas on how to solve it. One time I made guide for the
installation of the torque/PBS server, based in my own(shortest)
experience and manuals. Now, I'm trying to to do it following this guide.
(I write it at the end)
My server is a node too, and scheq (is it a problem? )
thank again,
Alejandro
----- the guide -----
TORQUE
1.-to download the instalation. untar it:
tar xvfz torque-1.1.0p6.tar.gz
2.-go to dir: cd torque-1.1.Op6
4.-run ./configure without options then => pbs_home will be
/usr/spool/PBS/
5.-Modify this file:
> vi buildutils/makedepend-sh
>>>> modify "eval $CPP..." linea 576 of 758 add the line 'grep -v ">$"'
eval $CPP $arg_cc $d/$s $errout | \
sed -n -e "s;^\# [0-9][0-9 ]*\"\(.*\)\";$f: \1;p" | \
grep -v "$s\$" | \ #maybe this line be lager than this
grep -v ">$" | \ #add this
sed -e 's;\([^ :]*: [^ ]*\).*;\1;' \
>> $TMP
6.- make
7.- make install
8.- cd doc # documentation and mapages
9.- make install # or in 4.- ./configure --enable-docs
--------Configuracion server-sched
>torque.setup root
>echo 'grid1'>/usr/spool/PBS/server_priv/nodes #creando el file nodes
>gmgr -c 'create node worker2' # and worker3,....
-configurando el startup del daemom
>vi /etc/init.d/pbs
**write the script for startup pbs, mom, scheq
>chkconfig --add pbs
>chkconfig --level 345 pbs on
>qmgr
:s s acl_hosts=*.cubaenergia.cu
:s s acl_host_enable=true
:quit
>service pbs restart
--------Setup MOM node ------
>mount cpmaster:/..../torque-1.1.0p6 /soft
>cd soft/src/resmon/
>make install
>cd ../cmds
>make install
>cd ../iff
>make install
--------Config in MOM ------
vi /etc/init.d/pbs-mom
**write the script for mom
>chkconfig --add pbs-mom
>chkconfig --level 345 pbs-mom on
>vi {PBSHOME}/mom_priv/config
$clienthost grid1
$logevent 255
$usecp *:/share /share
>service pbs start
-----------------------end of guide ----
-----Original Message-----
> Message: 2
> Date: Fri, 4 Nov 2005 10:55:00 -0800
> From: Garrick Staples <garrick at usc.edu>
> Subject: Re: [torqueusers] unable to contact node, Connection refused
> To: torqueusers at supercluster.org
> Message-ID: <20051104185500.GU14266 at polop.usc.edu>
> Content-Type: text/plain; charset="us-ascii"
>
> On Fri, Nov 04, 2005 at 08:50:08AM -0500, Alejandro Hurtado Turi?o
> alleged:
> > Hi,
> > I've installed a torque-1.1.0p6 on a cluster, but the jobs don't run
> > unless forced w/ qrun I'm not planning on installing Maui
> > and just using the default fifo scheduler (pbs_sched)
>
> 1.1.0p6 is really old. There have been countless improvements since
> then.
>
>
> > The pbs server log say at start up:
> > 10/31/2005 13:54:28;0006;PBS_Server;Svr;PBS_Server;Using ports
> Server:
> > 15001 Scheduler:15004 MOM:15002
> > 10/31/2005 13:54:28;0002;PBS_Server;Svr;PBS_Server;Server Ready, pid
> =
> > 2317
> > 10/31/2005 13:54:28;0004;PBS_Server;Svr;WARNING;!!! unable to contact
> node
> > grid1 !!!
> > 10/31/2005 13:54:28;0001;PBS_Server;Svr;PBS_Server;Connection refused
> > (111) in contact_sched, Could not contact Scheduler - port 15004
>
> Is pbs_mom running on grid1? Is pbs_sched running on the server?
>
> > ---
> > grid1 is the pbs server with pbsmon installed.
> > no firewall
> > my mom-priv/config
> > $clienthost grid1
> > $logevent 255
> > $restricted grid1
> > $usecp *:/data /data
>
> Is grid1 a node or server? The information above is confusing.
>
> The server logs indicate that it is a node. The MOM config looks like
> grid1 is the server.
>
> And you don't need the $restricted line, that just weakens security.
>
>
> > Looking for it in the web, i see the problem is common but notbody
> answer
> > it.
> > could anybody helpme please!??
> > thanks
> > ale
>
> These kinds of things are just config errors that are hard to diagnose
> over email. Eventually the admin figures it out and doesn't tell
> anyone
> :)
>
> --
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 189 bytes
> Desc: not available
> Url :
> http://www.supercluster.org/pipermail/torqueusers/attachments/20051104/
> bababcab/attachment-0001.bin
>
> ------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> End of torqueusers Digest, Vol 16, Issue 5
> ******************************************
More information about the torqueusers
mailing list