[torqueusers] Re: FW: HPUX 11 failure torque 2.0.0p1, 2,
3 and 1.2.0p6
Garrick Staples
garrick at usc.edu
Thu Dec 15 16:16:13 MST 2005
Did I get everything right in this p4 snap?
http://www.clusterresources.com/downloads/torque/snapshots/torque-2.0.0p4-snap.1134687812.tar.gz
On Thu, Dec 15, 2005 at 02:18:11PM -0800, Garrick Staples alleged:
> From what I gather, these problems stem from differences in the last arg
> to accept(), getsockopt(), and setsockopt(). BSD has it as a "int *",
> earlier POSIX changed it to "size_t *" (which is unsigned, and not the
> same size as int), later POSIX changed it to "socklen_t *" (unsigned,
> same size as int.)
>
> It was changed from int because of mixed signed issues, but I guess I'll
> just change them all back to int. That seems to be the most portable
> thing.
>
> On Thu, Dec 15, 2005 at 04:01:22PM -0600, Mike Coyne alleged:
> > Here is a diff of net_server.c in src/lib/Libnet I made the "second"
> > change in the current p3 build of net_server added backing the #if
> > defined _SOCKLEN_T stuff. It appeared to correct the problem with the
> > PBSE_BADCRED. This is a little premature I need to do a install on p3
> > and run it a bit though. This also included the previous fix on
> > new_client.c
> >
> > Hpux seems to have hartburn with socklen_t ...
> >
> > *** net_server.c Wed Nov 9 00:38:22 2005
> > --- /home/mcoyne/torque/torque-2.0.0p2/src/lib/Libnet/net_server.c
> > Fri Nov 11 03:27:08 2005
> > ***************
> > *** 259,270 ****
> > struct timeval timeout;
> > void close_conn();
> >
> > - timeout.tv_usec = 0;
> > - timeout.tv_sec = waittime;
> > -
> > char tmpLine[1024];
> > char id[]="wait_request";
> >
> > selset = readset; /* readset is global */
> >
> > n = select(FD_SETSIZE,&selset,(fd_set *)0,(fd_set *)0,&timeout);
> > --- 259,270 ----
> > struct timeval timeout;
> > void close_conn();
> >
> > char tmpLine[1024];
> > char id[]="wait_request";
> >
> > + timeout.tv_usec = 0;
> > + timeout.tv_sec = waittime;
> > +
> > selset = readset; /* readset is global */
> >
> > n = select(FD_SETSIZE,&selset,(fd_set *)0,(fd_set *)0,&timeout);
> > ***************
> > *** 401,411 ****
> > int newsock;
> > struct sockaddr_in from;
> >
> > - #if defined _SOCKLEN_T
> > socklen_t fromsize;
> > - #else /* _SOCKLEN_T */
> > - int fromsize;
> > - #endif /* _SCOKLEN_T */
> >
> > /* update lasttime of main socket */
> >
> > --- 401,407 ----
> >
> > -----Original Message-----
> > From: Garrick Staples [mailto:garrick at usc.edu]
> > Sent: Thursday, December 15, 2005 2:24 PM
> > To: Mike Coyne
> > Cc: Lippert, Kenneth B.; torqueusers at supercluster.org
> > Subject: Re: FW: HPUX 11 failure torque 2.0.0p1,2,3 and 1.2.0p6
> >
> > On Thu, Dec 15, 2005 at 01:29:29PM -0600, Mike Coyne alleged:
> > > There are some issues regarding HPUX and torque in versions after
> > > 1.2.0p5 surrounding pbs_iff on the client and server side. On the
> > > client side , src/lib/Netlib/net_client.c
> > >
> > >
> > >
> > > Below is a diff between 2.0.0.p3 and 1.2.0.p5 , in order to get
> > pbs_iff
> > > to connect from a remote host( one of the mom clients) I had to
> > backport
> > > the older version of this file ..
> >
> > The bits with tv_sec and select() don't look important to me.
> >
> > The important part might be the size of 'one'. I'm thinking it should
> > be an int, not a long. Can you try just that one change in p3?
> >
> > @@ -177,7 +172,7 @@ int client_to_svr(
> > int sock;
> > unsigned short tryport;
> > int flags;
> > - int one = 1;
> > + long one = 1;
> >
> > local.sin_family = AF_INET;
> > local.sin_addr.s_addr = 0;
> >
> >
> > The arguments changes to setsockopt() appears correct to me, especially
> > the last argument.
> >
> >
> > > In order to get src/resmom/hpux11 (or hpux10) / mom_mach.c to compile
> > I
> > > added
> > >
> > >
> > >
> > > extern int ignwalltime;
> >
> > Ouch. Fixed in CVS.
> >
> >
> > > The remaining problem is as follows,
> > >
> > > Pbs_iff dis connects with invalid credential ==>PBSE_BADCRED in
> > > src/server/process_request.c from
> >
> > This would imply the bind() to a priviledged port isn't working.
> >
> > Do you have bindresvport() on HPUX?
> >
> >
> >
> > > The output from gdb's server_conn has a suspious cn_addr the
> > connection
> > > was from a qstat on the same host as the server ? although this may
> > be
> > > fallout from a previous authentication error ?
> >
> > > (gdb) print svr_conn[sfds]
> > >
> > > $1 = {cn_addr = 2147483649, cn_handle = -1, cn_port = 40696, cn_authen
> > =
> > > 0,
> >
> > cn_port should probably be less than 1024 at that point.
> >
> > --
> > Garrick Staples, Linux/HPCC Administrator
> > University of Southern California
> >
>
> --
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051215/8fee9d31/attachment.bin
More information about the torqueusers
mailing list