[torqueusers] unable to contact node, Connection refused
Garrick Staples
garrick at usc.edu
Fri Nov 4 11:55:00 MST 2005
On Fri, Nov 04, 2005 at 08:50:08AM -0500, Alejandro Hurtado Turi?o alleged:
> Hi,
> I've installed a torque-1.1.0p6 on a cluster, but the jobs don't run
> unless forced w/ qrun I'm not planning on installing Maui
> and just using the default fifo scheduler (pbs_sched)
1.1.0p6 is really old. There have been countless improvements since
then.
> The pbs server log say at start up:
> 10/31/2005 13:54:28;0006;PBS_Server;Svr;PBS_Server;Using ports Server:
> 15001 Scheduler:15004 MOM:15002
> 10/31/2005 13:54:28;0002;PBS_Server;Svr;PBS_Server;Server Ready, pid =
> 2317
> 10/31/2005 13:54:28;0004;PBS_Server;Svr;WARNING;!!! unable to contact node
> grid1 !!!
> 10/31/2005 13:54:28;0001;PBS_Server;Svr;PBS_Server;Connection refused
> (111) in contact_sched, Could not contact Scheduler - port 15004
Is pbs_mom running on grid1? Is pbs_sched running on the server?
> ---
> grid1 is the pbs server with pbsmon installed.
> no firewall
> my mom-priv/config
> $clienthost grid1
> $logevent 255
> $restricted grid1
> $usecp *:/data /data
Is grid1 a node or server? The information above is confusing.
The server logs indicate that it is a node. The MOM config looks like
grid1 is the server.
And you don't need the $restricted line, that just weakens security.
> Looking for it in the web, i see the problem is common but notbody answer
> it.
> could anybody helpme please!??
> thanks
> ale
These kinds of things are just config errors that are hard to diagnose
over email. Eventually the admin figures it out and doesn't tell anyone
:)
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051104/bababcab/attachment.bin
More information about the torqueusers
mailing list