[torqueusers] Torque 2.1.x pbs_server process hogging cpu

garrick at speculation.org garrick at speculation.org
Thu Jun 15 02:38:52 MDT 2006

On Thu, Jun 15, 2006 at 10:10:06AM +0200, Martin Schafföner alleged:
> On Wednesday 14 June 2006 23:24, garrick at speculation.org wrote:
> > On Wed, Jun 14, 2006 at 09:59:25AM +0200, Martin Schafföner alleged:
> > > On Wednesday 14 June 2006 00:58, garrick at speculation.org wrote:
> > > > Yes, you are right.  That is an infinite loop.  But why is connect()
> > > > failing with EADDRNOTAVAIL?   "The specified address is not available
> > > > on the remote machine."  I don't know what that means.  Why would to
> > > > attempt a connection to any machine other than one with the specified
> > > > IP?
> >
> > Can you try the latest snap?  I changed it to a hybrid approach, it
> > tries bindresvport() once and falls back to looping around bind() with a
> > decrementing tryport.
> Yes, I just installed it and this one is working well. Still I wouldn't say it 
> works as expected because this bind()-loop hack should, in theory, not be 
> necessary, right?

I'm definitely not a fan of the bind()-loop.  I wrote the bindresvport()
when I observed RHEL3 on x86_64 take several seconds to run through the

> If there is anything else that I may do to track down the "real" problem and 
> solution, let me know!

IMHO, bindresvport() doesn't like the link-local line; which sounds to
me like it could be a linux kernel bug.  Of course, this isn't really
my area of expertise and I could be completely wrong.

The original bindresvport() code in TORQUE otherwise seems fine.  This
is the first I've heard of it not working, so I'm willing to think this
is a weird corner case with your configuration.

If you have support with Novell, you may want to put together a small
test case (socket(), bindresvport(), connect()) and take it up with

Or we just live with the new code, be happy it works, and not worry
about it :)

More information about the torqueusers mailing list