[torqueusers] APAC LDAP Friendly init_groups() - an acceptable
alternative to the current one ?
David Singleton
David.Singleton at anu.edu.au
Sun Oct 30 17:47:33 MST 2005
Chris,
There was a slight problem with that code that actually crashed SGI CXFS
servers (problem of too many gids). Please modify the code with:
- savedgroups[nsaved++]=getegid();
+ {
+ gid_t momegid = getegid();
+ int i, found=0;
+ for(i=0;i<nsaved && !found;i++)
+ found = (savedgroups[i] == momegid);
+ if (!found) savedgroups[nsaved++]=getegid();
+ }
And yes, we use this code on Linux and used it on Tru64.
David
Chris Samuel wrote:
> Hi folks,
>
> Back in April I wrote about the pain I was having on AIX with secondary groups
> being discarded due to AIX not implementing LDAP functionality into the
> getgrent() library call (and not documenting the fact).
>
> In reply Dave Singleton from APAC kindly posted a replacement init_groups()
> which is much more LDAP friendly, the posting of which I've attached to this
> email.
>
> I've been using this on the AIX node in question with success ever since, and
> I believe that APAC are using this code on their various clusters (the now
> defunct Tru64 Alpha SC as well as their Linux and Altix clusters), though
> Dave can confirm/refute this.. :-)
>
> Anyway, it would be really nice to see this folded into Torque so I didn't
> need to remember to patch it each time I touched the AIX box in question.
>
> The problem is that this code is shared across all architectures, so I don't
> know whether people would be happy to use this code across the lot or whether
> it would be necessary to make it a per-architecture implementation ?
>
> Feedback appreciated!
>
> Chris
>
>
> ------------------------------------------------------------------------
>
> Subject:
> Re: [torqueusers] Broken AIX getgrent() results in supplementary groups
> not set for LDAP users PBS jobs
> From:
> David Singleton <David.Singleton at anu.edu.au>
> Date:
> Thu, 28 Apr 2005 10:21:10 +1000
> To:
> Chris Samuel <csamuel at vpac.org>
>
> To:
> Chris Samuel <csamuel at vpac.org>
> CC:
> David Houlder <djh900 at anusf.anu.edu.au>, torqueusers at supercluster.org
>
>
>
>
> Chris Samuel wrote:
>
>> This took me ages to track down as I assumed that I wasn't looking at
>> an OS bug.
>>
>> The symptoms were a user who had a job that was supposed to write its
>> output files into a directory that was group writeable by one of his
>> secondary groups, but owned by a different user. It worked on Linux
>> but failed on AIX.
>>
>> Running the 'id' command from the command line showed all the correct
>> groups, but when the 'id' command ran via a PBS job only the primary
>> group was listed.
>>
>> Unfortunately when I ran the 'id' command through a PBS job it worked
>> fine and my supplementary groups were listed properly.
>>
>> I then realised that the user with the problem is an LDAP user whilst
>> my user was a local user (because I need to login if the LDAP server
>> fails). Creating myself an LDAP account with the same groups as my
>> local account duplicated the problem, only my primary group was listed
>> by 'id' when that was run as a PBS job.
>>
>> The generic code in the pbs_mom for getting a users supplementary
>> groups in init_groups() in src/resmom/start_exec.c uses getgrent() to
>> cycle through all the groups searching for all those a particular user
>> is in.
>>
>> I wrote a simple program to effectively do the same thing to just
>> tally the number of groups that it found for a user using getgrent(),
>> and was amazed to see that it found 2 for the local user and 0 for the
>> LDAP users!
>>
>> Digging around on Google confirmed my suspicion that getgrent() on AIX
>> is broken for LDAP users, I found this PDF file:
>>
>> http://www-1.ibm.com/servers/aix/whitepapers/ldap_naming.pdf
>>
>> which says on page 5:
>>
>> Many of the getxxxent() calls are not suitable for the LDAP
>> environment, and as a result they are not nis_ldap enabled
>> even though they are listed in the RFC2307 APIs:
>> getpwent () getspnam ()
>> getspent ()
>> getgrent ()
>> getservent ()
>> getprotoent ()
>> gethostent ()
>> getnetent ()
>>
>> RFC2307 is "An Approach for Using LDAP as a Network Information Service"
>>
>> Any ideas ?
>>
>
> Interestingly, even if init_groups() in start_exec.c does work with
> LDAP, it can trash your LDAP by making lotsa requests (depending on
> how nss_ldap works).
>
>
> Here is an LDAP friendly init_groups() written by David Houlder here
> at ANUSF. Uses getgroups() and initgroups().
>
>
> /*
> * init_groups - build the group list via an LDAP friendly method
> */
>
> int init_groups(char *pwname, /* User's name */
> int pwgrp, /* User's group from pw entry */
> int groupsize,/* size of the array, following argument */
> int *groups) /* ptr to group array, list build there */
> {
>
> /* DJH Jan 2004. The original implementation looped over all groups
> looking for membership. Thats OK for /etc/groups, but thrashes LDAP
> if you're using that for groups in nsswitch.conf. Since there is an
> explicit LDAP backend to do initgroups (3) efficiently in nss_ldap
> (on Linux), lets use initgroups() to figure out the group
> membership. A little clunky, but not too ugly. */
>
>
> extern sigset_t allsigs; /* set up at the start of mom_main */
> sigset_t savedset;
>
> int n, nsaved;
> gid_t savedgroups[NGROUPS_MAX+1]; /* plus one for the egid below */
>
> /* save current group access becuase we're about to overwrite it */
> nsaved=getgroups(NGROUPS_MAX, savedgroups);
> if (nsaved<0) {
> log_err(errno, "init_groups", "getgroups");
> return -1;
> }
> /* From the Linux man page: It is unspecified whether the effective
> group ID of the calling pro- cess is included in the returned
> list. (Thus, an application should also call getegid(2) and add
> or remove the resulting value.)
> */
> savedgroups[nsaved++]=getegid();
>
> if (pwgrp==0) {
> /* Emulate the original init_groups() behaviour which treated
> gid==0 as a special case */
> struct passwd *pwe=getpwnam(pwname);
> if (pwe==NULL) {
> log_err(errno, "init_groups", "no such user");
> return -1;
> }
> pwgrp=pwe->pw_gid;
> }
> /* Block signals while we do this or else the signal handler might
> run with strange geoup access */
> if (sigprocmask(SIG_BLOCK, &allsigs, &savedset) == -1) {
> log_err(errno, "init_groups", "sigprocmask(BLOCK)");
> return -1;
> }
> n=0;
> if (initgroups(pwname, pwgrp)<0) {
> log_err(errno, "init_groups", "initgroups");
> n=-1;
> } else {
> n=getgroups(groupsize, (gid_t *)groups);
> }
> /* restore state */
> if (setgroups(nsaved, savedgroups)<0)
> log_err(errno, "init_groups", "setgroups");
> if (sigprocmask(SIG_SETMASK, &savedset, NULL) == -1)
> log_err(errno, "init_groups", "sigprocmask(SIG_SETMASK)");
>
> return n;
> }
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
--
--------------------------------------------------------------------------
Dr David Singleton ANU Supercomputer Facility
HPC Systems Manager and APAC National Facility
David.Singleton at anu.edu.au Leonard Huxley Bldg (No. 56)
Phone: +61 2 6125 4389 Australian National University
Fax: +61 2 6125 8199 Canberra, ACT, 0200, Australia
--------------------------------------------------------------------------
More information about the torqueusers
mailing list