[torqueusers] Torque 2.1.x pbs_server process hogging cpu
martin.schaffoener at e-technik.uni-magdeburg.de
Thu Jun 15 04:51:31 MDT 2006
On Thursday 15 June 2006 10:38, garrick at speculation.org wrote:
> IMHO, bindresvport() doesn't like the link-local line; which sounds to
> me like it could be a linux kernel bug. Of course, this isn't really
> my area of expertise and I could be completely wrong.
I would think that this can't be the problem cause there are other services on
linux (I read somewhere that the tcpwrappers or something in that area) are
using bindresvport(), too, and they are working well. But sometimes it's the
What bothers me, though, is the fact that client_to_svr() succeeds when
connecting to the scheduler (moab in my case) or when starting the job on a
node, but that it fails only at the second attempt to connect to the mom.
BTW, I also noticed moms infinetily looping after jobs had finished,
apparently when they were trying to notify the server.
> The original bindresvport() code in TORQUE otherwise seems fine. This
I think so, too.
> is the first I've heard of it not working, so I'm willing to think this
> is a weird corner case with your configuration.
I have to protest! Everything is configured right [tm] on my cluster!
> If you have support with Novell, you may want to put together a small
> test case (socket(), bindresvport(), connect()) and take it up with
I guess I would have to setup a listener (like with netcat) and write this
small test case looping around. Maybe next week or so...
> Or we just live with the new code, be happy it works, and not worry
> about it :)
For the moment, yes.
Cognitive Systems Group, Institute of Electronics, Signal Processing and
Communication Technologies, Department of Electrical Engineering,
Otto-von-Guericke University Magdeburg
Phone: +49 391 6720063
More information about the torqueusers