[torqueusers] using non-privileged ports
Ken Nielson
knielson at adaptivecomputing.com
Fri Oct 28 15:13:26 MDT 2011
----- Original Message -----
> From: "Martin Siegert" <siegert at sfu.ca>
> To: torqueusers at supercluster.org
> Sent: Friday, October 28, 2011 2:57:15 PM
> Subject: Re: [torqueusers] using non-privileged ports
>
> Hi,
>
> On Fri, Oct 28, 2011 at 10:39:56AM -0700, Martin Siegert wrote:
> > Hi,
> >
> > we just recompiled torque with
> >
> > --disable-privports
> >
> > (since we constantly ran out of ports). Now we have a different
> > problem which is just as bad:
> >
> > # qstat -an1
> > Connection timed out
> > qstat: cannot connect to server b0 (errno=110) Connection timed out
> >
> > This does not appear right away after starting the server, but
> > after
> > a few hours of running. As far as I can tell the only way to get
> > the
> > server out of this state is to restart it.
> >
> > But there must be many sites that run torque with
> > --disable-privports.
> > Thus: what am I missing?
>
> We gave up: --disable-privports does not appear to be working. Now we
> are back to our previous problem (this is on the server - there are
> no connections in the TIME_WAIT on the nodes):
>
> # netstat -na | grep 15002
> tcp 0 0 172.18.1.0:629 172.18.1.152:15002
> TIME_WAIT
> tcp 0 0 172.18.1.0:701 172.18.1.152:15002
> TIME_WAIT
> tcp 0 0 172.18.1.0:689 172.18.1.152:15002
> TIME_WAIT
> tcp 0 0 172.18.1.0:685 172.18.1.152:15002
> TIME_WAIT
> tcp 0 0 172.18.1.0:951 172.18.1.152:15002
> TIME_WAIT
> tcp 0 0 172.18.1.0:979 172.18.1.152:15002
> TIME_WAIT
> tcp 0 0 172.18.1.0:962 172.18.1.152:15002
> TIME_WAIT
> tcp 0 0 172.18.1.0:669 172.18.1.154:15002
> TIME_WAIT
> tcp 0 0 172.18.1.0:662 172.18.1.154:15002
> TIME_WAIT
> tcp 0 0 172.18.1.0:804 172.18.1.154:15002
> TIME_WAIT
> ...
> # netstat -na | grep 15002 | wc -l
> 974
>
> For some reason the mom-server connections are not closed correctly
> and we
> end up with all these sockets in the TIME_WAIT state. Note that there
> are
> even several ones for the same node. Consequently we run out of
> ports.
>
> Is this a torque problem?
>
>
> w we work around the problem by setting
>
> net.ipv4.tcp_tw_recycle = 1
> net.ipv4.tcp_tw_reuse = 1
>
> Cheers,
> Martin
Martin,
Thanks for the information. I will see what is happening with this.
Ken
More information about the torqueusers
mailing list