[torqueusers] Problems compiling Torque GSSAPI branch

Mike Coyne Mike.Coyne at PACCAR.com
Wed Mar 10 08:33:28 MST 2010


The req_accept_forwarded_creds is when its authenticating and saving the creds to /tmp/<something that looks like a job name with krb...prepended> 
You might check to see if it wrote out the creds. One thing that has been helpful for me when mom dumps is to start it up in gdb with DEBUGING turned on to keep it from forking , let it dump and do a backtrace .... something like 
$ export PBSDEBUG=1
$ gdb /opt/torque/sbin/pbs_mom
> r <what ever options mom  you set>

And run your job to the node , when it dumps do a backtrace
> bt

that way you can see what called the library function that died.

Mike
-----Original Message-----
From: Peter Smith [mailto:peter.smith3882100 at gmail.com] 
Sent: Tuesday, March 09, 2010 4:43 PM
To: Mike Coyne
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] Problems compiling Torque GSSAPI branch

Hi Mike

SUCCESS, the branch now compiles and installs successfully. Thank you
very much, a whole week had been reserved to find a solution to this
problem, as we do not have any programmers at our site, but with the
help you provided, we can now move along much earlier. Thanks again.

Not much testing have been done yet, as the workday has ended but i
noticed that the pbs_mom on the clients seems to crash when a job is
submitted. This is the very first test of the GSSAPI branch, so maybe
this is just because of a configuration error i have made, but i will
post the details anyway in case it is a known problem that need a
special fix, and in case of i do not find any errors in my
configurations.

On the master pbs_server and pbs_sched are running and a test user
with a valid kerberos ticket submits the job "echo "sleep 30" | qsub".
The following is output from the worker node at the same time the job
is submitted:

On the worker pbs_mom is started with pbs_mom -D and the following
output is given:

MOM is up
do_rpp: got a resource monitor request
do_rpp: got a resource monitor request
Accepting user creds for 7.cluster-master.cluster-test.local
pbs_mom: LOG_ERROR::Resource temporarily unavailable (11) in mom_main,
Caught fatal core signal
Aborted (core dumped)

kern.log output:

2294.863178] pbs_mom[1930]: segfault at 0 ip b7cff3b3 sp bfffd81c
error 4 in libc-2.7.so[b7c89000+155000]

pbs_mom logfile output:

pbs_mom;Svr;req_accept_forwarded_creds;
pbs_mom;Svr;pbs_mom;LOG_ERROR::Resource temporarily unavailable (11)
in mom_main, Caught fatal core signal

pbs_mom coredump read with readelf -a core:

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              CORE (Core file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          52 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         47
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 0

There are no sections in this file.

There are no sections in this file.

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  NOTE           0x000614 0x00000000 0x00000000 0x0021c 0x00000     0
  LOAD           0x001000 0x08048000 0x00000000 0x00000 0x48000 R E 0x1000
  LOAD           0x001000 0x08090000 0x00000000 0x04000 0x04000 RW  0x1000
  LOAD           0x005000 0x08094000 0x00000000 0x100000 0x100000 RW  0x100=
  LOAD           0x105000 0xb7af8000 0x00000000 0xc3000 0xc3000 RW  0x1000
  LOAD           0x1c8000 0xb7bbb000 0x00000000 0x00000 0x04000 R E 0x1000
  LOAD           0x1c8000 0xb7bbf000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x1ca000 0xb7bc1000 0x00000000 0x00000 0x0a000 R E 0x1000
  LOAD           0x1ca000 0xb7bcb000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x1cc000 0xb7bd0000 0x00000000 0x00000 0x35000 R   0x1000
  LOAD           0x1cc000 0xb7c05000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x1ce000 0xb7c07000 0x00000000 0x00000 0x15000 R E 0x1000
  LOAD           0x1ce000 0xb7c1c000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x1d0000 0xb7c1e000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x1d2000 0xb7c20000 0x00000000 0x00000 0x10000 R E 0x1000
  LOAD           0x1d2000 0xb7c30000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x1d4000 0xb7c32000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x1d6000 0xb7c34000 0x00000000 0x00000 0x02000 R E 0x1000
  LOAD           0x1d6000 0xb7c36000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x1d7000 0xb7c37000 0x00000000 0x00000 0x02000 R E 0x1000
  LOAD           0x1d7000 0xb7c39000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x1d9000 0xb7c3b000 0x00000000 0x00000 0x07000 R E 0x1000
  LOAD           0x1d9000 0xb7c42000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x1da000 0xb7c43000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x1db000 0xb7c44000 0x00000000 0x00000 0x155000 R E 0x1000
  LOAD           0x1db000 0xb7d99000 0x00000000 0x01000 0x01000 R   0x1000
  LOAD           0x1dc000 0xb7d9a000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x1de000 0xb7d9c000 0x00000000 0x03000 0x03000 RW  0x1000
  LOAD           0x1e1000 0xb7d9f000 0x00000000 0x00000 0x02000 R E 0x1000
  LOAD           0x1e1000 0xb7da1000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x1e2000 0xb7da2000 0x00000000 0x00000 0x23000 R E 0x1000
  LOAD           0x1e2000 0xb7dc5000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x1e3000 0xb7dc6000 0x00000000 0x00000 0x92000 R E 0x1000
  LOAD           0x1e3000 0xb7e58000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x1e5000 0xb7e5a000 0x00000000 0x00000 0x29000 R E 0x1000
  LOAD           0x1e5000 0xb7e83000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x1e6000 0xb7e84000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x1e7000 0xb7e85000 0x00000000 0x00000 0x28000 R E 0x1000
  LOAD           0x1e7000 0xb7ead000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x1e8000 0xb7eae000 0x00000000 0xb1000 0xb1000 RW  0x1000
  LOAD           0x299000 0xb7f5f000 0x00000000 0x00000 0x02000 R E 0x1000
  LOAD           0x299000 0xb7f61000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x29b000 0xb7f64000 0x00000000 0x04000 0x04000 RW  0x1000
  LOAD           0x29f000 0xb7f68000 0x00000000 0x01000 0x01000 R E 0x1000
  LOAD           0x2a0000 0xb7f69000 0x00000000 0x00000 0x1a000 R E 0x1000
  LOAD           0x2a0000 0xb7f83000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x2a2000 0xbffeb000 0x00000000 0x15000 0x15000 RW  0x1000

There is no dynamic section in this file.

There are no relocations in this file.

There are no unwind sections in this file.

No version information found in this file.

Notes at offset 0x00000614 with length 0x0000021c:
  Owner		Data size	Description
  CORE		0x00000090	NT_PRSTATUS (prstatus structure)
  CORE		0x0000007c	NT_PRPSINFO (prpsinfo structure)
  CORE		0x00000090	NT_AUXV (auxiliary vector)
  LINUX		0x00000030	Unknown note type: (0x00000200)


On Tue, Mar 9, 2010 at 3:54 PM, Mike Coyne <Mike.Coyne at paccar.com> wrote:
> This is what I have for that section of svr_chk_owner.c
> I had to take a "swag" at closing the #ifdef as well. Also one of the str=mp's is a duplicate..
>
>
> /* NOTE:  enable case insensitive host check (CRI) */
>
> #ifdef GSSAPI
>  snprintf(uh,uhlen,"%s@%s",user,host);
>  if (!rootprinc) {
>    rootprinc = pbsgss_get_host_princname();
>  }
>  if (!rootprinc) {return 0;}
>  if (strcmp(uh,rootprinc) == 0) {
>    is_root = 1;
> #ifdef PBS_ROOT_ALWAYS_ADMIN
>    return(priv|ATR_DFLAG_MGRD|ATR_DFLAG_MGWR|ATR_DFLAG_OPRD|ATR_DFLAG=OPWR);
> #endif  /* PBS_ROOT_ALWAYS_ADMIN */
>    }
> #endif /* GSSAPI */
> #ifdef __CYGWIN__
>  if (IAmAdminByName(user) && !strcasecmp(host_no_port, server_host))
>    {
>    is_root = 1;
>    return(priv | ATR_DFLAG_MGRD | ATR_DFLAG_MGWR | ATR_DFLAG_OPRD | A=R_DFLAG_OPWR);
>    }
> #else /* __CYGWIN__ */
>  /* Run this even if we aren't doing GSSAPI.  This lets the scheduler=run
>     without tickets */
> /*
>  if ((strcmp(user,PBS_DEFAULT_ADMIN) == 0) &&
>      !strcasecmp(host_no_port,server_host))
> */
>  if ((strcmp(user, PBS_DEFAULT_ADMIN) == 0) &&
>      !strcasecmp(host_no_port, server_host))
>    {
>    is_root = 1;
>
> #ifdef PBS_ROOT_ALWAYS_ADMIN
>    return(priv | ATR_DFLAG_MGRD | ATR_DFLAG_MGWR | ATR_DFLAG_OPRD | A=R_DFLAG_OPWR);
> #endif
>    }
> #endif /* __CYGWIN__ guess */
>
>
>
> -----Original Message-----
> From: Peter Smith [mailto:peter.smith3882100 at gmail.com]
> Sent: Tuesday, March 09, 2010 8:38 AM
> To: Mike Coyne
> Cc: torqueusers at supercluster.org
> Subject: Re: [torqueusers] Problems compiling Torque GSSAPI branch
>
> Hi Mike
>
> I tried both the solutions you provided, and both takes care of the
> error message related to dis.h so this is great. I decided to edit out
> A_(...) like in your dis.h, but both ways is tested.
>
> I am definitely getting closer to the finish line, but it seems like i
> still have some work to do before getting a succesful compile. After
> compiling for quite some time, i get the following error messages:
>
> svr_chk_owner.c:357:1: error: unterminated #else
> make[3]: *** [svr_chk_owner.o] Error 1
> make[3]: Leaving directory `/shared/source/gssapi/src/server'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory `/shared/source/gssapi/src/server'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/shared/source/gssapi/src'
> make: *** [all-recursive] Error 1
>
> This is the lines from svr_chk_owner.c starting with line 357:
>
> #ifdef __CYGWIN__
>  if (IAmAdminByName(user) && !strcasecmp(host_no_port, server_host))
>    {
>    is_root = 1;
>    return(priv | ATR_DFLAG_MGRD | ATR_DFLAG_MGWR | ATR_DFLAG_OPRD |
> ATR_DFLAG_OPWR);
>    }
> #else /* __CYGWIN__ */
>  /* Run this even if we aren't doing GSSAPI.  This lets the scheduler=run
>     without tickets */
>  if ((strcmp(user,PBS_DEFAULT_ADMIN) == 0) &&
>      !strcasecmp(host_no_port,server_host))
>  if ((strcmp(user, PBS_DEFAULT_ADMIN) == 0) &&
>      !strcasecmp(host_no_port, server_host))
>    {
>    is_root = 1;
>
> I tried to comment out the block shown above as i do not use Cygwin,
> but then other error messages related to svr_chk_owner.c appears so i
> think that some other problem not related to the block may be present.
> Errors when above block is commented out:
>
> svr_chk_owner.c:380: error: expected identifier or '(' before 'if'
> svr_chk_owner.c:385: error: expected identifier or '(' before 'else'
> svr_chk_owner.c:390: error: expected identifier or '(' before 'if'
> svr_chk_owner.c:395: error: expected identifier or '(' before 'else'
> svr_chk_owner.c:400: error: expected identifier or '(' before 'return'
> svr_chk_owner.c:401: error: expected identifier or '(' before '}' token
> make[3]: *** [svr_chk_owner.o] Error 1
> make[3]: Leaving directory `/shared/source/gssapi/src/server'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory `/shared/source/gssapi/src/server'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/shared/source/gssapi/src'
> make: *** [all-recursive] Error 1
>
> This is the lines from svr_chk_owner.c starting with line 380:
>
> if (!(server.sv_attr[(int)SRV_ATR_managers].at_flags & ATR_VFLAG_SET))
>    {
>    if (is_root)
>      priv |= (ATR_DFLAG_MGRD | ATR_DFLAG_MGWR);
>    }
>  else if (acl_check(&server.sv_attr[SRV_ATR_managers], uh, ACL_User))
>    {
>    priv |= (ATR_DFLAG_MGRD | ATR_DFLAG_MGWR);
>    }
>
>  if (!(server.sv_attr[(int)SRV_ATR_operators].at_flags & ATR_VFLAG_SET)=
>    {
>    if (is_root)
>      priv |= (ATR_DFLAG_OPRD | ATR_DFLAG_OPWR);
>    }
>  else if (acl_check(&server.sv_attr[SRV_ATR_operators], uh, ACL_User))
>    {
>    priv |= (ATR_DFLAG_OPRD | ATR_DFLAG_OPWR);
>    }
>
>  return(priv);
>  }  /* END svr_get_privilege() */
>
> I just posted the error messages when the CYGWIN part is commented out
> as i might help solving the problem, but i am not sure if i am on
> right track..
>
>
> On Tue, Mar 9, 2010 at 2:26 PM, Mike Coyne <Mike.Coyne at paccar.com> wrote:
>> You might try to add a
>> #define A_(x) x
>> To get rid of it to your src/include/pbs_config.h and pbs_config.h.in
>> Or just edit out the A_(...)  it's a  carry over from the original
>> torque gssapi version that did not get fixed when it was updated to the
>> 2.4 code level ...
>> Mine looks more like this
>>
>>
>> /* the following routines set/control DIS over tcp */
>>
>> extern void DIS_tcp_reset (int fd, int rw);
>> extern void DIS_tcp_setup (int fd);
>> extern int  DIS_tcp_wflush (int fd);
>> extern void DIS_tcp_settimeout (long timeout);
>> extern int  DIS_tcp_istimeout (int fd);
>> extern void DIS_tcp_release (int fd);
>> #ifdef GSSAPI
>> extern void DIS_tcp_set_gss (int fd, gss_ctx_id_t ctx, OM_uint32 flags);
>> #endif
>>
>>
>> extern int  PConnTimeout(int);
>>
>> -----Original Message-----
>> From: torqueusers-bounces at supercluster.org
>> [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Peter Smith
>> Sent: Monday, March 08, 2010 6:54 PM
>> To: torqueusers at supercluster.org
>> Subject: [torqueusers] Problems compiling Torque GSSAPI branch
>>
>> Hi
>>
>> I am trying to compile the GSSAPI branch of Torque on a Debian Lenny
>> system.
>>
>> I run configure with the following options:
>>
>> ./configure --with-default-server=cluster-master
>> --with-server-home=/var/spool/pbs --with-rcp=scp --with-gssapi
>> --disable-unixsockets
>>
>> No errors are returned and the last string is "Ready for 'make'"
>>
>> Then i run make and 2-3 seconds and the following error messages is
>> shown:
>>
>> ../../../src/include/dis.h:250: error: expected '=', ',', ';', 'asm'
>> or '__attribute__' before 'A_'
>> make[3]: *** [dis.lo] Error 1
>> make[3]: Leaving directory `/shared/source/gssapi/src/lib/Libpbs'
>> make[2]: *** [all-recursive] Error 1
>> make[2]: Leaving directory `/shared/source/gssapi/src/lib'
>> make[1]: *** [all-recursive] Error 1
>> make[1]: Leaving directory `/shared/source/gssapi/src'
>> make: *** [all-recursive] Error 1
>>
>> If i run configure without the --with-gssapi and --disable-unixsockets
>> options, exactly the same error message is shown so this makes no
>> difference. On the same system if i try to download
>> torque-2.4.6.tar.gz this package compile and installes fine without
>> errors.
>>
>> This is a couple of lines from dis.h, the line numbers is added by me:
>>
>> 249: extern int  DIS_tcp_istimeout (int fd);
>> 250: extern void DIS_tcp_release A_((int fd));
>> 251: #ifdef GSSAPI
>> 252: extern void DIS_tcp_set_gss A_((int fd, gss_ctx_id_t ctx,
>> OM_uint32 flags));
>> 253: #endif
>>
>> Does anybody have a suggestion on how i can solve this problem, i
>> would really like to get this working.
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>>
>
>
>




More information about the torqueusers mailing list