[torqueusers] Re: FW: HPUX 11 failure torque 2.0.0p1, 2, 3 and 1.2.0p6

Garrick Staples garrick at usc.edu
Thu Dec 15 12:52:50 MST 2005


The report I saw earlier said that pbs_iff was working correctly on
HPUX.  Was that report wrong?

Once you backported net_client.c, everything is now working correctly?

On Thu, Dec 15, 2005 at 01:29:29PM -0600, Mike Coyne alleged:
> Sorry about the direct email,  I don't seem to be able to submit to the
> torqueusers at supercluster.org .
> 
>  
> 
> I don't know if this is similar to your previous problems with HPUX ?  I
> cant seem to get past version 2.0.0.p2 
> 
>  
> 
> ________________________________
> 
> From: Mike Coyne 
> Sent: Thursday, December 15, 2005 1:06 PM
> To: torqueusers at supercluster.org
> Subject: HPUX 11 failure torque 2.0.0p1,2,3 and 1.2.0p6
> 
>  
> 
> There are some issues regarding HPUX and torque in versions after
> 1.2.0p5 surrounding pbs_iff on the client and server side.  On the
> client side , src/lib/Netlib/net_client.c
> 
>  
> 
> Below is a diff between 2.0.0.p3 and 1.2.0.p5 , in order to get pbs_iff
> to connect from a remote host( one of the mom clients) I had to backport
> the older version of this file ..
> 
>  
> 
> *** net_client.c        Fri Jun  3 11:41:18 2005
> 
> --- /home/mcoyne/torque/torque-2.0.0p3/src/lib/Libnet/net_client.c
> Wed Nov 23 23:45:52 2005
> 
> ***************
> 
> *** 105,120 ****
> 
>   
> 
>     {
> 
>     fd_set fs;
> 
> !   int n, val, len, rc;
> 
>     struct timeval tv;
> 
>   
> 
> !   tv.tv_sec = (time_t)timeout;
> 
>     tv.tv_usec = 0;
> 
>   
> 
>     FD_ZERO(&fs);
> 
>     FD_SET(sockd,&fs);
> 
>   
> 
> !   if ((n = select(FD_SETSIZE,0,&fs,0,&tv)) != 1)
> 
>       {
> 
>       /* FAILURE:  socket not ready for write */
> 
>   
> 
> --- 105,122 ----
> 
>   
> 
>     {
> 
>     fd_set fs;
> 
> !   int n, val, rc;
> 
>     struct timeval tv;
> 
>   
> 
> !   socklen_t len;
> 
> ! 
> 
> !   tv.tv_sec = timeout;
> 
>     tv.tv_usec = 0;
> 
>   
> 
>     FD_ZERO(&fs);
> 
>     FD_SET(sockd,&fs);
> 
>   
> 
> !   if ((n = select(sockd+1,0,&fs,0,&tv)) != 1)
> 
>       {
> 
>       /* FAILURE:  socket not ready for write */
> 
>   
> 
> ***************
> 
> *** 170,176 ****
> 
>     int                sock;
> 
>     unsigned short     tryport;
> 
>     int                flags;
> 
> !   int                one = 1;
> 
>     
> 
>     local.sin_family = AF_INET;
> 
>     local.sin_addr.s_addr = 0;
> 
> --- 172,178 ----
> 
>     int                sock;
> 
>     unsigned short     tryport;
> 
>     int                flags;
> 
> !   long               one = 1;
> 
>     
> 
>     local.sin_family = AF_INET;
> 
>     local.sin_addr.s_addr = 0;
> 
> ***************
> 
> *** 214,225 ****
> 
>         sock,
> 
>         SOL_SOCKET,
> 
>         SO_REUSEADDR,
> 
> !       (const char*)&one, 
> 
> !       sizeof(one));
> 
>   
> 
>       local.sin_port = htons(tryport);
> 
>   
> 
>   #if !defined(__TDARWIN) || defined(__TDARWINBIND)
> 
>       while (bind(sock,(struct sockaddr *)&local,sizeof(local)) < 0) 
> 
>         {
> 
>   #ifdef NDEBUG2
> 
> --- 216,228 ----
> 
>         sock,
> 
>         SOL_SOCKET,
> 
>         SO_REUSEADDR,
> 
> !       (void *)&one, 
> 
> !       sizeof(void *));
> 
>   
> 
>       local.sin_port = htons(tryport);
> 
>   
> 
>   #if !defined(__TDARWIN) || defined(__TDARWINBIND)
> 
> + 
> 
>       while (bind(sock,(struct sockaddr *)&local,sizeof(local)) < 0) 
> 
>         {
> 
>   #ifdef NDEBUG2
> 
>  
> 
>  
> 
> In order to get src/resmom/hpux11 (or hpux10) / mom_mach.c  to compile I
> added 
> 
>  
> 
>  extern  int     ignwalltime;
> 
>  
> 
> to this file after the includes.
> 
>  
> 
>  
> 
> The remaining problem is  as follows,  
> 
> Pbs_iff  dis connects with invalid credential ==>PBSE_BADCRED in
> src/server/process_request.c from 
> 
> Torque2.0.0.p2 and forward ( a guess here) .  
> 
>  
> 
>     if (svr_conn[sfds].cn_authen != PBS_NET_CONN_AUTHENTICATED)
> 
>       rc = PBSE_BADCRED;
> 
>     else
> 
>       rc = authenticate_user(request, &conn_credent[sfds]);
> 
>  
> 
> ....
> 
> The output from gdb's server_conn has a suspious cn_addr  the connection
> was from a qstat on the same host as the server ?  although this may be
> fallout from a previous authentication error ?
> 
>  
> 
> .....
> 
>  
> 
> Breakpoint 4, process_request (sfds=17) at process_request.c:369
> 
> 369         if (svr_conn[sfds].cn_authen != PBS_NET_CONN_AUTHENTICATED)
> 
> (gdb) s
> 
> 370                      rc = PBSE_BADCRED;
> 
> (gdb) print svr_conn[sfds]
> 
> $1 = {cn_addr = 2147483649, cn_handle = -1, cn_port = 40696, cn_authen =
> 0, 
> 
>   cn_active = FromClientDIS, cn_lasttime = 1134672011, 
> 
>   cn_func = 0x400000000001ad30 <.opd+1360>, cn_oncl = 0}
> 
> (gdb) print  PBS_NET_CONN_AUTHENTICATED
> 
> No symbol "PBS_NET_CONN_AUTHENTICATED" in current context.
> 

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051215/46a4b431/attachment.bin


More information about the torqueusers mailing list