[torqueusers] Re: FW: HPUX 11 failure torque 2.0.0p1, 2,
3 and 1.2.0p6
Garrick Staples
garrick at usc.edu
Thu Dec 15 12:52:50 MST 2005
The report I saw earlier said that pbs_iff was working correctly on
HPUX. Was that report wrong?
Once you backported net_client.c, everything is now working correctly?
On Thu, Dec 15, 2005 at 01:29:29PM -0600, Mike Coyne alleged:
> Sorry about the direct email, I don't seem to be able to submit to the
> torqueusers at supercluster.org .
>
>
>
> I don't know if this is similar to your previous problems with HPUX ? I
> cant seem to get past version 2.0.0.p2
>
>
>
> ________________________________
>
> From: Mike Coyne
> Sent: Thursday, December 15, 2005 1:06 PM
> To: torqueusers at supercluster.org
> Subject: HPUX 11 failure torque 2.0.0p1,2,3 and 1.2.0p6
>
>
>
> There are some issues regarding HPUX and torque in versions after
> 1.2.0p5 surrounding pbs_iff on the client and server side. On the
> client side , src/lib/Netlib/net_client.c
>
>
>
> Below is a diff between 2.0.0.p3 and 1.2.0.p5 , in order to get pbs_iff
> to connect from a remote host( one of the mom clients) I had to backport
> the older version of this file ..
>
>
>
> *** net_client.c Fri Jun 3 11:41:18 2005
>
> --- /home/mcoyne/torque/torque-2.0.0p3/src/lib/Libnet/net_client.c
> Wed Nov 23 23:45:52 2005
>
> ***************
>
> *** 105,120 ****
>
>
>
> {
>
> fd_set fs;
>
> ! int n, val, len, rc;
>
> struct timeval tv;
>
>
>
> ! tv.tv_sec = (time_t)timeout;
>
> tv.tv_usec = 0;
>
>
>
> FD_ZERO(&fs);
>
> FD_SET(sockd,&fs);
>
>
>
> ! if ((n = select(FD_SETSIZE,0,&fs,0,&tv)) != 1)
>
> {
>
> /* FAILURE: socket not ready for write */
>
>
>
> --- 105,122 ----
>
>
>
> {
>
> fd_set fs;
>
> ! int n, val, rc;
>
> struct timeval tv;
>
>
>
> ! socklen_t len;
>
> !
>
> ! tv.tv_sec = timeout;
>
> tv.tv_usec = 0;
>
>
>
> FD_ZERO(&fs);
>
> FD_SET(sockd,&fs);
>
>
>
> ! if ((n = select(sockd+1,0,&fs,0,&tv)) != 1)
>
> {
>
> /* FAILURE: socket not ready for write */
>
>
>
> ***************
>
> *** 170,176 ****
>
> int sock;
>
> unsigned short tryport;
>
> int flags;
>
> ! int one = 1;
>
>
>
> local.sin_family = AF_INET;
>
> local.sin_addr.s_addr = 0;
>
> --- 172,178 ----
>
> int sock;
>
> unsigned short tryport;
>
> int flags;
>
> ! long one = 1;
>
>
>
> local.sin_family = AF_INET;
>
> local.sin_addr.s_addr = 0;
>
> ***************
>
> *** 214,225 ****
>
> sock,
>
> SOL_SOCKET,
>
> SO_REUSEADDR,
>
> ! (const char*)&one,
>
> ! sizeof(one));
>
>
>
> local.sin_port = htons(tryport);
>
>
>
> #if !defined(__TDARWIN) || defined(__TDARWINBIND)
>
> while (bind(sock,(struct sockaddr *)&local,sizeof(local)) < 0)
>
> {
>
> #ifdef NDEBUG2
>
> --- 216,228 ----
>
> sock,
>
> SOL_SOCKET,
>
> SO_REUSEADDR,
>
> ! (void *)&one,
>
> ! sizeof(void *));
>
>
>
> local.sin_port = htons(tryport);
>
>
>
> #if !defined(__TDARWIN) || defined(__TDARWINBIND)
>
> +
>
> while (bind(sock,(struct sockaddr *)&local,sizeof(local)) < 0)
>
> {
>
> #ifdef NDEBUG2
>
>
>
>
>
> In order to get src/resmom/hpux11 (or hpux10) / mom_mach.c to compile I
> added
>
>
>
> extern int ignwalltime;
>
>
>
> to this file after the includes.
>
>
>
>
>
> The remaining problem is as follows,
>
> Pbs_iff dis connects with invalid credential ==>PBSE_BADCRED in
> src/server/process_request.c from
>
> Torque2.0.0.p2 and forward ( a guess here) .
>
>
>
> if (svr_conn[sfds].cn_authen != PBS_NET_CONN_AUTHENTICATED)
>
> rc = PBSE_BADCRED;
>
> else
>
> rc = authenticate_user(request, &conn_credent[sfds]);
>
>
>
> ....
>
> The output from gdb's server_conn has a suspious cn_addr the connection
> was from a qstat on the same host as the server ? although this may be
> fallout from a previous authentication error ?
>
>
>
> .....
>
>
>
> Breakpoint 4, process_request (sfds=17) at process_request.c:369
>
> 369 if (svr_conn[sfds].cn_authen != PBS_NET_CONN_AUTHENTICATED)
>
> (gdb) s
>
> 370 rc = PBSE_BADCRED;
>
> (gdb) print svr_conn[sfds]
>
> $1 = {cn_addr = 2147483649, cn_handle = -1, cn_port = 40696, cn_authen =
> 0,
>
> cn_active = FromClientDIS, cn_lasttime = 1134672011,
>
> cn_func = 0x400000000001ad30 <.opd+1360>, cn_oncl = 0}
>
> (gdb) print PBS_NET_CONN_AUTHENTICATED
>
> No symbol "PBS_NET_CONN_AUTHENTICATED" in current context.
>
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051215/46a4b431/attachment.bin
More information about the torqueusers
mailing list