[torqueusers] Re: FW: HPUX 11 failure torque 2.0.0p1, 2,
3 and 1.2.0p6
Garrick Staples
garrick at usc.edu
Thu Dec 15 15:18:11 MST 2005
From what I gather, these problems stem from differences in the last arg
to accept(), getsockopt(), and setsockopt(). BSD has it as a "int *",
earlier POSIX changed it to "size_t *" (which is unsigned, and not the
same size as int), later POSIX changed it to "socklen_t *" (unsigned,
same size as int.)
It was changed from int because of mixed signed issues, but I guess I'll
just change them all back to int. That seems to be the most portable
thing.
On Thu, Dec 15, 2005 at 04:01:22PM -0600, Mike Coyne alleged:
> Here is a diff of net_server.c in src/lib/Libnet I made the "second"
> change in the current p3 build of net_server added backing the #if
> defined _SOCKLEN_T stuff. It appeared to correct the problem with the
> PBSE_BADCRED. This is a little premature I need to do a install on p3
> and run it a bit though. This also included the previous fix on
> new_client.c
>
> Hpux seems to have hartburn with socklen_t ...
>
> *** net_server.c Wed Nov 9 00:38:22 2005
> --- /home/mcoyne/torque/torque-2.0.0p2/src/lib/Libnet/net_server.c
> Fri Nov 11 03:27:08 2005
> ***************
> *** 259,270 ****
> struct timeval timeout;
> void close_conn();
>
> - timeout.tv_usec = 0;
> - timeout.tv_sec = waittime;
> -
> char tmpLine[1024];
> char id[]="wait_request";
>
> selset = readset; /* readset is global */
>
> n = select(FD_SETSIZE,&selset,(fd_set *)0,(fd_set *)0,&timeout);
> --- 259,270 ----
> struct timeval timeout;
> void close_conn();
>
> char tmpLine[1024];
> char id[]="wait_request";
>
> + timeout.tv_usec = 0;
> + timeout.tv_sec = waittime;
> +
> selset = readset; /* readset is global */
>
> n = select(FD_SETSIZE,&selset,(fd_set *)0,(fd_set *)0,&timeout);
> ***************
> *** 401,411 ****
> int newsock;
> struct sockaddr_in from;
>
> - #if defined _SOCKLEN_T
> socklen_t fromsize;
> - #else /* _SOCKLEN_T */
> - int fromsize;
> - #endif /* _SCOKLEN_T */
>
> /* update lasttime of main socket */
>
> --- 401,407 ----
>
> -----Original Message-----
> From: Garrick Staples [mailto:garrick at usc.edu]
> Sent: Thursday, December 15, 2005 2:24 PM
> To: Mike Coyne
> Cc: Lippert, Kenneth B.; torqueusers at supercluster.org
> Subject: Re: FW: HPUX 11 failure torque 2.0.0p1,2,3 and 1.2.0p6
>
> On Thu, Dec 15, 2005 at 01:29:29PM -0600, Mike Coyne alleged:
> > There are some issues regarding HPUX and torque in versions after
> > 1.2.0p5 surrounding pbs_iff on the client and server side. On the
> > client side , src/lib/Netlib/net_client.c
> >
> >
> >
> > Below is a diff between 2.0.0.p3 and 1.2.0.p5 , in order to get
> pbs_iff
> > to connect from a remote host( one of the mom clients) I had to
> backport
> > the older version of this file ..
>
> The bits with tv_sec and select() don't look important to me.
>
> The important part might be the size of 'one'. I'm thinking it should
> be an int, not a long. Can you try just that one change in p3?
>
> @@ -177,7 +172,7 @@ int client_to_svr(
> int sock;
> unsigned short tryport;
> int flags;
> - int one = 1;
> + long one = 1;
>
> local.sin_family = AF_INET;
> local.sin_addr.s_addr = 0;
>
>
> The arguments changes to setsockopt() appears correct to me, especially
> the last argument.
>
>
> > In order to get src/resmom/hpux11 (or hpux10) / mom_mach.c to compile
> I
> > added
> >
> >
> >
> > extern int ignwalltime;
>
> Ouch. Fixed in CVS.
>
>
> > The remaining problem is as follows,
> >
> > Pbs_iff dis connects with invalid credential ==>PBSE_BADCRED in
> > src/server/process_request.c from
>
> This would imply the bind() to a priviledged port isn't working.
>
> Do you have bindresvport() on HPUX?
>
>
>
> > The output from gdb's server_conn has a suspious cn_addr the
> connection
> > was from a qstat on the same host as the server ? although this may
> be
> > fallout from a previous authentication error ?
>
> > (gdb) print svr_conn[sfds]
> >
> > $1 = {cn_addr = 2147483649, cn_handle = -1, cn_port = 40696, cn_authen
> =
> > 0,
>
> cn_port should probably be less than 1024 at that point.
>
> --
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California
>
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051215/919d5da6/attachment.bin
More information about the torqueusers
mailing list