[torqueusers] APAC LDAP Friendly init_groups() - an acceptable alternative to the current one ?

David Singleton David.Singleton at anu.edu.au
Sun Oct 30 17:47:33 MST 2005


Chris,

There was a slight problem with that code that actually crashed SGI CXFS
servers (problem of too many gids).  Please modify the code with:

-    savedgroups[nsaved++]=getegid();
+    {
+       gid_t momegid = getegid();
+       int i, found=0;
+       for(i=0;i<nsaved && !found;i++)
+           found = (savedgroups[i] == momegid);
+       if (!found) savedgroups[nsaved++]=getegid();
+    }

And yes, we use this code on Linux and used it on Tru64.

David


Chris Samuel wrote:
> Hi folks,
> 
> Back in April I wrote about the pain I was having on AIX with secondary groups 
> being discarded due to AIX not implementing LDAP functionality into the 
> getgrent() library call (and not documenting the fact).
> 
> In reply Dave Singleton from APAC kindly posted a replacement init_groups() 
> which is much more LDAP friendly, the posting of which I've attached to this 
> email.
> 
> I've been using this on the AIX node in question with success ever since, and 
> I believe that APAC are using this code on their various clusters (the now 
> defunct Tru64 Alpha SC as well as their Linux and Altix clusters), though 
> Dave can confirm/refute this.. :-)
> 
> Anyway, it would be really nice to see this folded into Torque so I didn't 
> need to remember to patch it each time I touched the AIX box in question.
> 
> The problem is that this code is shared across all architectures, so I don't 
> know whether people would be happy to use this code across the lot or whether 
> it would be necessary to make it a per-architecture implementation ?
> 
> Feedback appreciated!
> 
> Chris
> 
> 
> ------------------------------------------------------------------------
> 
> Subject:
> Re: [torqueusers] Broken AIX getgrent() results in supplementary groups 
> not set for LDAP users PBS jobs
> From:
> David Singleton <David.Singleton at anu.edu.au>
> Date:
> Thu, 28 Apr 2005 10:21:10 +1000
> To:
> Chris Samuel <csamuel at vpac.org>
> 
> To:
> Chris Samuel <csamuel at vpac.org>
> CC:
> David Houlder <djh900 at anusf.anu.edu.au>, torqueusers at supercluster.org
> 
> 
> 
> 
> Chris Samuel wrote:
> 
>> This took me ages to track down as I assumed that I wasn't looking at 
>> an OS bug.
>>
>> The symptoms were a user who had a job that was supposed to write its 
>> output files into a directory that was group writeable by one of his 
>> secondary groups, but owned by a different user.  It worked on Linux 
>> but failed on AIX.
>>
>> Running the 'id' command from the command line showed all the correct 
>> groups, but when the 'id' command ran via a PBS job only the primary 
>> group was listed.
>>
>> Unfortunately when I ran the 'id' command through a PBS job it worked 
>> fine and my supplementary groups were listed properly.
>>
>> I then realised that the user with the problem is an LDAP user whilst 
>> my user was a local user (because I need to login if the LDAP server 
>> fails).  Creating myself an LDAP account with the same groups as my 
>> local account duplicated the problem, only my primary group was listed 
>> by 'id' when that was run as a PBS job.
>>
>> The generic code in the pbs_mom for getting a users supplementary 
>> groups in init_groups() in src/resmom/start_exec.c uses getgrent() to 
>> cycle through all the groups searching for all those a particular user 
>> is in.
>>
>> I wrote a simple program to effectively do the same thing to just 
>> tally the number of groups that it found for a user using getgrent(), 
>> and was amazed to see that it found 2 for the local user and 0 for the 
>> LDAP users!
>>
>> Digging around on Google confirmed my suspicion that getgrent() on AIX 
>> is broken for LDAP users, I found this PDF file:
>>
>>     http://www-1.ibm.com/servers/aix/whitepapers/ldap_naming.pdf
>>
>> which says on page 5:
>>
>>  Many of the getxxxent() calls are not suitable for the LDAP
>>  environment, and as a result they are not nis_ldap enabled
>>  even though they are listed in the RFC2307 APIs:  
>>  getpwent ()  getspnam ()
>>  getspent ()
>>  getgrent ()
>>  getservent ()
>>  getprotoent ()
>>  gethostent ()
>>  getnetent ()
>>  
>> RFC2307 is "An Approach for Using LDAP as a Network Information Service"
>>
>> Any ideas ?
>>
> 
> Interestingly, even if init_groups() in start_exec.c does work with
> LDAP, it can trash your LDAP by making lotsa requests (depending on
> how nss_ldap works).
> 
> 
> Here is an LDAP friendly init_groups() written by David Houlder here
> at ANUSF.  Uses getgroups() and initgroups().
> 
> 
> /*
>  * init_groups - build the group list via an LDAP friendly method
>  */
> 
> int init_groups(char *pwname,   /* User's name */
>                 int   pwgrp,    /* User's group from pw entry */
>                 int   groupsize,/* size of the array, following argument */
>                 int  *groups)   /* ptr to group array, list build there */
> {
> 
>     /* DJH Jan 2004. The original implementation looped over all groups
>        looking for membership. Thats OK for /etc/groups, but thrashes LDAP
>        if you're using that for groups in nsswitch.conf. Since there is an
>        explicit LDAP backend to do initgroups (3) efficiently in nss_ldap
>        (on Linux), lets use initgroups() to figure out the group
>        membership. A little clunky, but not too ugly.  */
> 
> 
>     extern sigset_t allsigs; /* set up at the start of mom_main */
>     sigset_t savedset;
> 
>     int n, nsaved;
>     gid_t savedgroups[NGROUPS_MAX+1]; /* plus one for the egid below */
> 
>     /* save current group access becuase we're about to overwrite it */
>     nsaved=getgroups(NGROUPS_MAX, savedgroups);
>     if (nsaved<0) {
>         log_err(errno, "init_groups", "getgroups");
>         return -1;
>     }
>     /* From the Linux man page: It is unspecified whether the effective
>        group ID of the calling pro- cess is included in the returned
>        list. (Thus, an application should also call getegid(2) and add
>        or remove the resulting value.)
>     */
>     savedgroups[nsaved++]=getegid();
> 
>     if (pwgrp==0) {
>         /* Emulate the original init_groups() behaviour which treated
>            gid==0 as a special case */
>         struct passwd *pwe=getpwnam(pwname);
>         if (pwe==NULL) {
>             log_err(errno, "init_groups", "no such user");
>             return -1;
>         }
>         pwgrp=pwe->pw_gid;
>     }
>     /* Block signals while we do this or else the signal handler might
>        run with strange geoup access */
>     if (sigprocmask(SIG_BLOCK, &allsigs, &savedset) == -1) {
>         log_err(errno, "init_groups", "sigprocmask(BLOCK)");
>         return -1;
>     }
>     n=0;
>     if (initgroups(pwname, pwgrp)<0) {
>         log_err(errno, "init_groups", "initgroups");
>         n=-1;
>     } else {
>         n=getgroups(groupsize, (gid_t *)groups);
>     }
>     /* restore state */
>     if (setgroups(nsaved, savedgroups)<0)
>         log_err(errno, "init_groups", "setgroups");
>     if (sigprocmask(SIG_SETMASK, &savedset, NULL) == -1)
>         log_err(errno, "init_groups", "sigprocmask(SIG_SETMASK)");
> 
>     return n;
> }
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


-- 
--------------------------------------------------------------------------
    Dr David Singleton               ANU Supercomputer Facility
    HPC Systems Manager              and APAC National Facility
    David.Singleton at anu.edu.au       Leonard Huxley Bldg (No. 56)
    Phone: +61 2 6125 4389           Australian National University
    Fax:   +61 2 6125 8199           Canberra, ACT, 0200, Australia
--------------------------------------------------------------------------


More information about the torqueusers mailing list