[torqueusers] APAC LDAP Friendly init_groups() - an acceptable
alternative to the current one ?
Dave Jackson
jacksond at clusterresources.com
Mon Oct 31 08:29:46 MST 2005
David, Chris,
The LDAP friendly MOM group code has been integrated and enabled by
default. If any issues are detected, it can be disabled by setting the
#define __TOLDGROUP in pbs_config.h
Please let us know if you see any behavioral changes or performance
issues associated with this release. The change is available in the
latest TORQUE-2.0.0 pre patch 1 snapshot.
Dave
On Mon, 2005-10-31 at 11:47 +1100, David Singleton wrote:
> Chris,
>
> There was a slight problem with that code that actually crashed SGI CXFS
> servers (problem of too many gids). Please modify the code with:
>
> - savedgroups[nsaved++]=getegid();
> + {
> + gid_t momegid = getegid();
> + int i, found=0;
> + for(i=0;i<nsaved && !found;i++)
> + found = (savedgroups[i] == momegid);
> + if (!found) savedgroups[nsaved++]=getegid();
> + }
>
> And yes, we use this code on Linux and used it on Tru64.
>
> David
>
>
> Chris Samuel wrote:
> > Hi folks,
> >
> > Back in April I wrote about the pain I was having on AIX with secondary groups
> > being discarded due to AIX not implementing LDAP functionality into the
> > getgrent() library call (and not documenting the fact).
> >
> > In reply Dave Singleton from APAC kindly posted a replacement init_groups()
> > which is much more LDAP friendly, the posting of which I've attached to this
> > email.
> >
> > I've been using this on the AIX node in question with success ever since, and
> > I believe that APAC are using this code on their various clusters (the now
> > defunct Tru64 Alpha SC as well as their Linux and Altix clusters), though
> > Dave can confirm/refute this.. :-)
> >
> > Anyway, it would be really nice to see this folded into Torque so I didn't
> > need to remember to patch it each time I touched the AIX box in question.
> >
> > The problem is that this code is shared across all architectures, so I don't
> > know whether people would be happy to use this code across the lot or whether
> > it would be necessary to make it a per-architecture implementation ?
> >
> > Feedback appreciated!
> >
> > Chris
> >
> >
> > ------------------------------------------------------------------------
> >
> > Subject:
> > Re: [torqueusers] Broken AIX getgrent() results in supplementary groups
> > not set for LDAP users PBS jobs
> > From:
> > David Singleton <David.Singleton at anu.edu.au>
> > Date:
> > Thu, 28 Apr 2005 10:21:10 +1000
> > To:
> > Chris Samuel <csamuel at vpac.org>
> >
> > To:
> > Chris Samuel <csamuel at vpac.org>
> > CC:
> > David Houlder <djh900 at anusf.anu.edu.au>, torqueusers at supercluster.org
> >
> >
> >
> >
> > Chris Samuel wrote:
> >
> >> This took me ages to track down as I assumed that I wasn't looking at
> >> an OS bug.
> >>
> >> The symptoms were a user who had a job that was supposed to write its
> >> output files into a directory that was group writeable by one of his
> >> secondary groups, but owned by a different user. It worked on Linux
> >> but failed on AIX.
> >>
> >> Running the 'id' command from the command line showed all the correct
> >> groups, but when the 'id' command ran via a PBS job only the primary
> >> group was listed.
> >>
> >> Unfortunately when I ran the 'id' command through a PBS job it worked
> >> fine and my supplementary groups were listed properly.
> >>
> >> I then realised that the user with the problem is an LDAP user whilst
> >> my user was a local user (because I need to login if the LDAP server
> >> fails). Creating myself an LDAP account with the same groups as my
> >> local account duplicated the problem, only my primary group was listed
> >> by 'id' when that was run as a PBS job.
> >>
> >> The generic code in the pbs_mom for getting a users supplementary
> >> groups in init_groups() in src/resmom/start_exec.c uses getgrent() to
> >> cycle through all the groups searching for all those a particular user
> >> is in.
> >>
> >> I wrote a simple program to effectively do the same thing to just
> >> tally the number of groups that it found for a user using getgrent(),
> >> and was amazed to see that it found 2 for the local user and 0 for the
> >> LDAP users!
> >>
> >> Digging around on Google confirmed my suspicion that getgrent() on AIX
> >> is broken for LDAP users, I found this PDF file:
> >>
> >> http://www-1.ibm.com/servers/aix/whitepapers/ldap_naming.pdf
> >>
> >> which says on page 5:
> >>
> >> Many of the getxxxent() calls are not suitable for the LDAP
> >> environment, and as a result they are not nis_ldap enabled
> >> even though they are listed in the RFC2307 APIs:
> >> getpwent () getspnam ()
> >> getspent ()
> >> getgrent ()
> >> getservent ()
> >> getprotoent ()
> >> gethostent ()
> >> getnetent ()
> >>
> >> RFC2307 is "An Approach for Using LDAP as a Network Information Service"
> >>
> >> Any ideas ?
> >>
> >
> > Interestingly, even if init_groups() in start_exec.c does work with
> > LDAP, it can trash your LDAP by making lotsa requests (depending on
> > how nss_ldap works).
> >
> >
> > Here is an LDAP friendly init_groups() written by David Houlder here
> > at ANUSF. Uses getgroups() and initgroups().
> >
> >
> > /*
> > * init_groups - build the group list via an LDAP friendly method
> > */
> >
> > int init_groups(char *pwname, /* User's name */
> > int pwgrp, /* User's group from pw entry */
> > int groupsize,/* size of the array, following argument */
> > int *groups) /* ptr to group array, list build there */
> > {
> >
> > /* DJH Jan 2004. The original implementation looped over all groups
> > looking for membership. Thats OK for /etc/groups, but thrashes LDAP
> > if you're using that for groups in nsswitch.conf. Since there is an
> > explicit LDAP backend to do initgroups (3) efficiently in nss_ldap
> > (on Linux), lets use initgroups() to figure out the group
> > membership. A little clunky, but not too ugly. */
> >
> >
> > extern sigset_t allsigs; /* set up at the start of mom_main */
> > sigset_t savedset;
> >
> > int n, nsaved;
> > gid_t savedgroups[NGROUPS_MAX+1]; /* plus one for the egid below */
> >
> > /* save current group access becuase we're about to overwrite it */
> > nsaved=getgroups(NGROUPS_MAX, savedgroups);
> > if (nsaved<0) {
> > log_err(errno, "init_groups", "getgroups");
> > return -1;
> > }
> > /* From the Linux man page: It is unspecified whether the effective
> > group ID of the calling pro- cess is included in the returned
> > list. (Thus, an application should also call getegid(2) and add
> > or remove the resulting value.)
> > */
> > savedgroups[nsaved++]=getegid();
> >
> > if (pwgrp==0) {
> > /* Emulate the original init_groups() behaviour which treated
> > gid==0 as a special case */
> > struct passwd *pwe=getpwnam(pwname);
> > if (pwe==NULL) {
> > log_err(errno, "init_groups", "no such user");
> > return -1;
> > }
> > pwgrp=pwe->pw_gid;
> > }
> > /* Block signals while we do this or else the signal handler might
> > run with strange geoup access */
> > if (sigprocmask(SIG_BLOCK, &allsigs, &savedset) == -1) {
> > log_err(errno, "init_groups", "sigprocmask(BLOCK)");
> > return -1;
> > }
> > n=0;
> > if (initgroups(pwname, pwgrp)<0) {
> > log_err(errno, "init_groups", "initgroups");
> > n=-1;
> > } else {
> > n=getgroups(groupsize, (gid_t *)groups);
> > }
> > /* restore state */
> > if (setgroups(nsaved, savedgroups)<0)
> > log_err(errno, "init_groups", "setgroups");
> > if (sigprocmask(SIG_SETMASK, &savedset, NULL) == -1)
> > log_err(errno, "init_groups", "sigprocmask(SIG_SETMASK)");
> >
> > return n;
> > }
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
More information about the torqueusers
mailing list