[torqueusers] APAC LDAP Friendly init_groups() - an acceptable alternative to the current one ?

Dave Jackson jacksond at clusterresources.com
Mon Oct 31 08:29:46 MST 2005


David, Chris,

  The LDAP friendly MOM group code has been integrated and enabled by
default.  If any issues are detected, it can be disabled by setting the
#define __TOLDGROUP in pbs_config.h  

  Please let us know if you see any behavioral changes or performance
issues associated with this release.  The change is available in the
latest TORQUE-2.0.0 pre patch 1 snapshot.

Dave

On Mon, 2005-10-31 at 11:47 +1100, David Singleton wrote:
> Chris,
> 
> There was a slight problem with that code that actually crashed SGI CXFS
> servers (problem of too many gids).  Please modify the code with:
> 
> -    savedgroups[nsaved++]=getegid();
> +    {
> +       gid_t momegid = getegid();
> +       int i, found=0;
> +       for(i=0;i<nsaved && !found;i++)
> +           found = (savedgroups[i] == momegid);
> +       if (!found) savedgroups[nsaved++]=getegid();
> +    }
> 
> And yes, we use this code on Linux and used it on Tru64.
> 
> David
> 
> 
> Chris Samuel wrote:
> > Hi folks,
> > 
> > Back in April I wrote about the pain I was having on AIX with secondary groups 
> > being discarded due to AIX not implementing LDAP functionality into the 
> > getgrent() library call (and not documenting the fact).
> > 
> > In reply Dave Singleton from APAC kindly posted a replacement init_groups() 
> > which is much more LDAP friendly, the posting of which I've attached to this 
> > email.
> > 
> > I've been using this on the AIX node in question with success ever since, and 
> > I believe that APAC are using this code on their various clusters (the now 
> > defunct Tru64 Alpha SC as well as their Linux and Altix clusters), though 
> > Dave can confirm/refute this.. :-)
> > 
> > Anyway, it would be really nice to see this folded into Torque so I didn't 
> > need to remember to patch it each time I touched the AIX box in question.
> > 
> > The problem is that this code is shared across all architectures, so I don't 
> > know whether people would be happy to use this code across the lot or whether 
> > it would be necessary to make it a per-architecture implementation ?
> > 
> > Feedback appreciated!
> > 
> > Chris
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > Subject:
> > Re: [torqueusers] Broken AIX getgrent() results in supplementary groups 
> > not set for LDAP users PBS jobs
> > From:
> > David Singleton <David.Singleton at anu.edu.au>
> > Date:
> > Thu, 28 Apr 2005 10:21:10 +1000
> > To:
> > Chris Samuel <csamuel at vpac.org>
> > 
> > To:
> > Chris Samuel <csamuel at vpac.org>
> > CC:
> > David Houlder <djh900 at anusf.anu.edu.au>, torqueusers at supercluster.org
> > 
> > 
> > 
> > 
> > Chris Samuel wrote:
> > 
> >> This took me ages to track down as I assumed that I wasn't looking at 
> >> an OS bug.
> >>
> >> The symptoms were a user who had a job that was supposed to write its 
> >> output files into a directory that was group writeable by one of his 
> >> secondary groups, but owned by a different user.  It worked on Linux 
> >> but failed on AIX.
> >>
> >> Running the 'id' command from the command line showed all the correct 
> >> groups, but when the 'id' command ran via a PBS job only the primary 
> >> group was listed.
> >>
> >> Unfortunately when I ran the 'id' command through a PBS job it worked 
> >> fine and my supplementary groups were listed properly.
> >>
> >> I then realised that the user with the problem is an LDAP user whilst 
> >> my user was a local user (because I need to login if the LDAP server 
> >> fails).  Creating myself an LDAP account with the same groups as my 
> >> local account duplicated the problem, only my primary group was listed 
> >> by 'id' when that was run as a PBS job.
> >>
> >> The generic code in the pbs_mom for getting a users supplementary 
> >> groups in init_groups() in src/resmom/start_exec.c uses getgrent() to 
> >> cycle through all the groups searching for all those a particular user 
> >> is in.
> >>
> >> I wrote a simple program to effectively do the same thing to just 
> >> tally the number of groups that it found for a user using getgrent(), 
> >> and was amazed to see that it found 2 for the local user and 0 for the 
> >> LDAP users!
> >>
> >> Digging around on Google confirmed my suspicion that getgrent() on AIX 
> >> is broken for LDAP users, I found this PDF file:
> >>
> >>     http://www-1.ibm.com/servers/aix/whitepapers/ldap_naming.pdf
> >>
> >> which says on page 5:
> >>
> >>  Many of the getxxxent() calls are not suitable for the LDAP
> >>  environment, and as a result they are not nis_ldap enabled
> >>  even though they are listed in the RFC2307 APIs:  
> >>  getpwent ()  getspnam ()
> >>  getspent ()
> >>  getgrent ()
> >>  getservent ()
> >>  getprotoent ()
> >>  gethostent ()
> >>  getnetent ()
> >>  
> >> RFC2307 is "An Approach for Using LDAP as a Network Information Service"
> >>
> >> Any ideas ?
> >>
> > 
> > Interestingly, even if init_groups() in start_exec.c does work with
> > LDAP, it can trash your LDAP by making lotsa requests (depending on
> > how nss_ldap works).
> > 
> > 
> > Here is an LDAP friendly init_groups() written by David Houlder here
> > at ANUSF.  Uses getgroups() and initgroups().
> > 
> > 
> > /*
> >  * init_groups - build the group list via an LDAP friendly method
> >  */
> > 
> > int init_groups(char *pwname,   /* User's name */
> >                 int   pwgrp,    /* User's group from pw entry */
> >                 int   groupsize,/* size of the array, following argument */
> >                 int  *groups)   /* ptr to group array, list build there */
> > {
> > 
> >     /* DJH Jan 2004. The original implementation looped over all groups
> >        looking for membership. Thats OK for /etc/groups, but thrashes LDAP
> >        if you're using that for groups in nsswitch.conf. Since there is an
> >        explicit LDAP backend to do initgroups (3) efficiently in nss_ldap
> >        (on Linux), lets use initgroups() to figure out the group
> >        membership. A little clunky, but not too ugly.  */
> > 
> > 
> >     extern sigset_t allsigs; /* set up at the start of mom_main */
> >     sigset_t savedset;
> > 
> >     int n, nsaved;
> >     gid_t savedgroups[NGROUPS_MAX+1]; /* plus one for the egid below */
> > 
> >     /* save current group access becuase we're about to overwrite it */
> >     nsaved=getgroups(NGROUPS_MAX, savedgroups);
> >     if (nsaved<0) {
> >         log_err(errno, "init_groups", "getgroups");
> >         return -1;
> >     }
> >     /* From the Linux man page: It is unspecified whether the effective
> >        group ID of the calling pro- cess is included in the returned
> >        list. (Thus, an application should also call getegid(2) and add
> >        or remove the resulting value.)
> >     */
> >     savedgroups[nsaved++]=getegid();
> > 
> >     if (pwgrp==0) {
> >         /* Emulate the original init_groups() behaviour which treated
> >            gid==0 as a special case */
> >         struct passwd *pwe=getpwnam(pwname);
> >         if (pwe==NULL) {
> >             log_err(errno, "init_groups", "no such user");
> >             return -1;
> >         }
> >         pwgrp=pwe->pw_gid;
> >     }
> >     /* Block signals while we do this or else the signal handler might
> >        run with strange geoup access */
> >     if (sigprocmask(SIG_BLOCK, &allsigs, &savedset) == -1) {
> >         log_err(errno, "init_groups", "sigprocmask(BLOCK)");
> >         return -1;
> >     }
> >     n=0;
> >     if (initgroups(pwname, pwgrp)<0) {
> >         log_err(errno, "init_groups", "initgroups");
> >         n=-1;
> >     } else {
> >         n=getgroups(groupsize, (gid_t *)groups);
> >     }
> >     /* restore state */
> >     if (setgroups(nsaved, savedgroups)<0)
> >         log_err(errno, "init_groups", "setgroups");
> >     if (sigprocmask(SIG_SETMASK, &savedset, NULL) == -1)
> >         log_err(errno, "init_groups", "sigprocmask(SIG_SETMASK)");
> > 
> >     return n;
> > }
> > 
> > 
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 



More information about the torqueusers mailing list