[torquedev] poll() vs select() in torque

Josh Butikofer josh at clusterresources.com
Mon Mar 23 16:23:19 MDT 2009


There is actually a patch we have that fixes this for most operating systems.

Instead of moving over to poll(), which would change quick a bit of code, we 
simply create a fd_set pointer that points to memory we allocate, in increments 
of FD_SETSIZE. If we have more descriptors, we make this piece of memory larger. 
The FD_* macros don't mind if the fd_set pointer you pass in points to memory 
larger than FD_SETSIZE, and neither does select(). Our research has shown that 
this is an acceptable way of handling the descriptor limit problem for at least 
the Linux and FreeBSD operating systems (and glibc). We do not know if other 
operating systems/C libraries define the FD_* macros or select in a way that 
would cause problems with this method.

The patch also polls the OS to find out what your ulimit -n is set to and uses 
this as the maximum file descriptor count (instead of the default 1024).

In the future, we would like to move away from select and use poll() or, better 
yet, epoll() on Linux and kqueue for BSD, etc. These other methods are more 
efficient than poll() and select() in that they don't have a O(n) complexity.

According to my manpage, poll is a POSIX standard, so other modern OS's should 
have it as well.

I will dig up the patch and post it to the mailing list.

Oh, and by the way, customers have been using this patch heavily in production 
for several months now.

Josh Butikofer
Cluster Resources, Inc.
#############################


Michael Barnes wrote:
> At our site, we are having issues with pbs_server running out of 
> file descriptors.  We have increased the number of available
> file descriptors to the process, and then we are seeing similar
> problems under heavy load.
> 
> I believe that the reason for thest problems is that now the number
> of filehandles now goes beyond FD_SETSIZE.
> 
> Any suggestions on fixing this?  Is it OK just to redefine
> __FD_SETSIZE or should I go though and change all of the select() calls to
> poll()?  If its best to change over to poll(), is this something that
> should be integrated into torque?
> 
> This will be relatively easy for Linux, but I don't know how portable
> poll() is on other systems.
> 
> Comments, suggestions?
> 
> -mb
> 


More information about the torquedev mailing list