[torquedev] read_nonblocking_socket() wtf?

Michael Barnes Michael.Barnes at jlab.org
Fri Jul 16 14:29:32 MDT 2010

On Jul 16, 2010, at 4:06 PM, Garrick Staples wrote:

> I've been looking into a problem regarding maui sometimes hanging in a read()
> on its socket to pbs_server. The hangs happen in pbs_disconnect() after a
> normal timeout. I thought this was weird because we define read() to be
> read_nonblocking_socket() which a nice little 30-second loop around a
> nonblocking read().
> The define to read_nonblocking_socket() replaces a blocking read wrapped with
> an ALRM of pbs_tcp_timeout seconds.
> So why would maui hang on a non-blocking read()? Is there something broken in
> my kernel? What a mystery!
> It turns out that read_nonblocking_socket does the exact opposite of what it
> says because the fcntl() call is commented out! WTF? A neat little ALRM-wrapped
> read() call is replaced with a broken hard-wired implementation.
> I'm on 2.1.x. Is this all fixed up in later branches?

I'm on 2.1.10, and my read_nonblocking_socket() has the proper
fcntl call, but we see still see hangs from Maui from time to time.
I believe the hangs are on order of 15 minutes or so.


| Michael Barnes
| Thomas Jefferson National Accelerator Facility
| Scientific Computing Group
| 12000 Jefferson Ave.
| Newport News, VA 23606
| (757) 269-7634

More information about the torquedev mailing list