[torqueusers] Re: Disappearing Nodes

Garrick Staples garrick at usc.edu
Wed Mar 30 09:29:10 MST 2005

On Wed, Mar 30, 2005 at 04:53:02PM +0100, gianfranco sciacca alleged:
> On Thu, 2005-03-24 at 06:50, Hannu V??is??nen wrote:
> > On Wed, Mar 23, 2005 at 10:04:14AM -0500, Jeremy Stout wrote:
> > > Hello. Over the weeknd, I noticed that the nodes on my cluster would
> > > disappear and come back every few minutes. When they would disappear,
> > > the status would often appear as "down".
> > 
> > This may have something to do with firewalls.
> > My nodes disappear soon after I disable the firewall, and then
> > pbs_mom log shows something like this
> > 
> > pbs_mom;Svr;pbs_mom;No child processes (10) in is_update_stat, cannot specify protocol version
> > pbs_mom;Svr;pbs_mom;im_eof, Premature end of message from addr xxx.xxx.xxx.xxx:15001
> >                                                   Server node->===============
> > 
> > Port 15001 is enabled in firewall.
[...awkward top-posting edited..]
> In my case ports 15001 to 15004 are open in the firewall on both
> machines of my test cluster. Indeed, the node is assigned jobs and
> executes them, provided it is marked free manually. It then returns to
> the down state.
> Searching the archives, I've seen the issue "no hello/cluster-addrs
> messages received from server" (which I get probing the node with
> momctl) mentioned a few times, but a possible solution was never
> mentioned.

I'm pretty all of those cases were caused by net filtering.

> How to get round this? I should probably mention that I've followed by
> the numbers the quickstart guide. In addition I have configured the
> server and adjusted the firewall as mentioned above. There seems to be
> an additional step to get started?

I suspect you also need ports 512 through 1024 open.  But I'd just disabled all
net filtering between your nodes and torque server.  There's a

Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050330/9618af8c/attachment.bin

More information about the torqueusers mailing list