[torqueusers] Re: FW: HPUX 11 failure torque 2.0.0p1, 2, 3 and 1.2.0p6

Garrick Staples garrick at usc.edu
Thu Dec 15 15:18:11 MST 2005


From what I gather, these problems stem from differences in the last arg
to accept(), getsockopt(), and setsockopt().  BSD has it as a "int *",
earlier POSIX changed it to "size_t *" (which is unsigned, and not the
same size as int), later POSIX changed it to "socklen_t *" (unsigned,
same size as int.)

It was changed from int because of mixed signed issues, but I guess I'll
just change them all back to int.  That seems to be the most portable
thing.

On Thu, Dec 15, 2005 at 04:01:22PM -0600, Mike Coyne alleged:
> Here is a diff of net_server.c  in src/lib/Libnet I made the "second"
> change in the current p3 build of net_server added backing the #if
> defined _SOCKLEN_T stuff. It appeared to correct the problem with the
> PBSE_BADCRED. This is a little premature I need to do a install on p3
> and run it a bit though. This also included the previous fix on
> new_client.c 
> 
> Hpux seems to have hartburn with socklen_t ... 
> 
> *** net_server.c        Wed Nov  9 00:38:22 2005
> --- /home/mcoyne/torque/torque-2.0.0p2/src/lib/Libnet/net_server.c
> Fri Nov 11 03:27:08 2005
> ***************
> *** 259,270 ****
>     struct timeval timeout;
>     void close_conn();
>   
> -   timeout.tv_usec = 0;
> -   timeout.tv_sec  = waittime;
> - 
>     char tmpLine[1024];
>     char id[]="wait_request";
>   
>     selset = readset;  /* readset is global */
>   
>     n = select(FD_SETSIZE,&selset,(fd_set *)0,(fd_set *)0,&timeout);
> --- 259,270 ----
>     struct timeval timeout;
>     void close_conn();
>   
>     char tmpLine[1024];
>     char id[]="wait_request";
>   
> +   timeout.tv_usec = 0;
> +   timeout.tv_sec  = waittime;
> + 
>     selset = readset;  /* readset is global */
>   
>     n = select(FD_SETSIZE,&selset,(fd_set *)0,(fd_set *)0,&timeout);
> ***************
> *** 401,411 ****
>     int newsock;
>     struct sockaddr_in from;
>   
> - #if defined _SOCKLEN_T
>     socklen_t fromsize;
> - #else /* _SOCKLEN_T */
> -   int fromsize;
> - #endif /* _SCOKLEN_T */
>   
>     /* update lasttime of main socket */
>   
> --- 401,407 ----
> 
> -----Original Message-----
> From: Garrick Staples [mailto:garrick at usc.edu] 
> Sent: Thursday, December 15, 2005 2:24 PM
> To: Mike Coyne
> Cc: Lippert, Kenneth B.; torqueusers at supercluster.org
> Subject: Re: FW: HPUX 11 failure torque 2.0.0p1,2,3 and 1.2.0p6
> 
> On Thu, Dec 15, 2005 at 01:29:29PM -0600, Mike Coyne alleged:
> > There are some issues regarding HPUX and torque in versions after
> > 1.2.0p5 surrounding pbs_iff on the client and server side.  On the
> > client side , src/lib/Netlib/net_client.c
> > 
> >  
> > 
> > Below is a diff between 2.0.0.p3 and 1.2.0.p5 , in order to get
> pbs_iff
> > to connect from a remote host( one of the mom clients) I had to
> backport
> > the older version of this file ..
> 
> The bits with tv_sec and select() don't look important to me.
> 
> The important part might be the size of 'one'.  I'm thinking it should
> be an int, not a long.  Can you try just that one change in p3?
> 
> @@ -177,7 +172,7 @@ int client_to_svr(
>    int                sock;
>    unsigned short     tryport;
>    int                flags;
> -  int                one = 1;
> +  long               one = 1;
>    
>    local.sin_family = AF_INET;
>    local.sin_addr.s_addr = 0;
> 
> 
> The arguments changes to setsockopt() appears correct to me, especially
> the last argument.
> 
>  
> > In order to get src/resmom/hpux11 (or hpux10) / mom_mach.c  to compile
> I
> > added 
> > 
> >  
> > 
> >  extern  int     ignwalltime;
> 
> Ouch.  Fixed in CVS.
> 
> 
> > The remaining problem is  as follows,  
> > 
> > Pbs_iff  dis connects with invalid credential ==>PBSE_BADCRED in
> > src/server/process_request.c from 
> 
> This would imply the bind() to a priviledged port isn't working.
> 
> Do you have bindresvport() on HPUX?
> 
> 
>  
> > The output from gdb's server_conn has a suspious cn_addr  the
> connection
> > was from a qstat on the same host as the server ?  although this may
> be
> > fallout from a previous authentication error ?
> 
> > (gdb) print svr_conn[sfds]
> > 
> > $1 = {cn_addr = 2147483649, cn_handle = -1, cn_port = 40696, cn_authen
> =
> > 0, 
> 
> cn_port should probably be less than 1024 at that point.
> 
> -- 
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California
> 

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051215/919d5da6/attachment.bin


More information about the torqueusers mailing list