[torqueusers] Re: Disappearing Nodes

gianfranco sciacca gs at hep.ucl.ac.uk
Wed Mar 30 08:53:02 MST 2005


In my case ports 15001 to 15004 are open in the firewall on both
machines of my test cluster. Indeed, the node is assigned jobs and
executes them, provided it is marked free manually. It then returns to
the down state.

Searching the archives, I've seen the issue "no hello/cluster-addrs
messages received from server" (which I get probing the node with
momctl) mentioned a few times, but a possible solution was never
mentioned.

How to get round this? I should probably mention that I've followed by
the numbers the quickstart guide. In addition I have configured the
server and adjusted the firewall as mentioned above. There seems to be
an additional step to get started?

cheers, gianfranco

On Thu, 2005-03-24 at 06:50, Hannu Väisänen wrote:
> On Wed, Mar 23, 2005 at 10:04:14AM -0500, Jeremy Stout wrote:
> > Hello. Over the weeknd, I noticed that the nodes on my cluster would
> > disappear and come back every few minutes. When they would disappear,
> > the status would often appear as "down".
> 
> This may have something to do with firewalls.
> My nodes disappear soon after I disable the firewall, and then
> pbs_mom log shows something like this
> 
> pbs_mom;Svr;pbs_mom;No child processes (10) in is_update_stat, cannot specify protocol version
> pbs_mom;Svr;pbs_mom;im_eof, Premature end of message from addr xxx.xxx.xxx.xxx:15001
>                                                   Server node->===============
> 
> Port 15001 is enabled in firewall.




More information about the torqueusers mailing list