[torquedev] read_nonblocking_socket() wtf?

Garrick Staples garrick at usc.edu
Fri Jul 16 14:06:38 MDT 2010


I've been looking into a problem regarding maui sometimes hanging in a read()
on its socket to pbs_server. The hangs happen in pbs_disconnect() after a
normal timeout. I thought this was weird because we define read() to be
read_nonblocking_socket() which a nice little 30-second loop around a
nonblocking read().

The define to read_nonblocking_socket() replaces a blocking read wrapped with
an ALRM of pbs_tcp_timeout seconds.

So why would maui hang on a non-blocking read()? Is there something broken in
my kernel? What a mystery!

It turns out that read_nonblocking_socket does the exact opposite of what it
says because the fcntl() call is commented out! WTF? A neat little ALRM-wrapped
read() call is replaced with a broken hard-wired implementation.

I'm on 2.1.x. Is this all fixed up in later branches?


-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Life is Good!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20100716/da9465f5/attachment.bin 


More information about the torquedev mailing list