[torqueusers] How to configure Torque with PAM right? (and cpuset also!)

Gus Correa gus at ldeo.columbia.edu
Mon Dec 20 15:40:08 MST 2010


Hi Garrick

Many thanks for your very clear explanations, as usual.

1) I will use the new PAM libraries as you suggested.

**

2) I know asking for better documentation isn't good etiquette,
but since Santa Claus is coming to town, it may be worth trying.

The Torque Admin Manual, section 3.4 Host Security, only talks
about the old pam_authuser:

http://www.clusterresources.com/torquedocs21/3.4hostsecurity.shtml

It would be great to have it updated, perhaps to a writeup
extracted from your email, pointing to the new PAM,
or explaining how to setup either the new or the old PAM.
A few examples of pam config files for each version would be great also.

**

3) My goal is just to prevent users to ssh directly/interactively
to the compute nodes, at least for those that don't have
jobs running there.
However, I don't want to break any legitimate ssh access to the
compute nodes, for instance, those ssh connections that are used by some
forms of mpiexec (OSC mpiexec, MPICH2 mpiexec_rsh, etc).
I'd guess this is what a lot of people want from PAM-supported Torque.

Any suggestions for the lines on the /etc/pam.d files to achieve this?

**

4) Indeed, the jobs weren't running not because of PAM at all.
Actually, they had a funny behavior, flipping state from E to R to Q,
but hanging on the queue forever.

A search on syslog message files showed that the failure was
coming from my incomplete attempt to build Torque with --enable-cpuset.
I hadn't done the cpuset setup completely, which requires extra steps
besides the mere building of Torque.
However, I managed to fix it, after some googling to find the bits and
limbs that were missing.

Like PAM support, section 3.5 on Linux Cpuset Support
in the Admin Manual is a bit terse, could be expanded a bit.

If this may be helpful to others, here is what I did:

A) configure Torque with --enable-cpusets,
make, make install, make packages, install packages on compute nodes,
setup the /etc/init.d startup files, etc.

B) (as root or sudo) mkdir /dev/cpuset (compute nodes)

C) add this line to /etc/fstab,
to ensure /dev/cpuset mounts on boot (compute nodes):

none	/dev/cpuset	cpuset	defaults	0 0

D) mount /dev/cpuset (compute nodes)

E) service pbs_mom start (compute nodes)

(Or instead of D & E just reboot the compute nodes.)

F) start the server and scheduler (or Maui scheduler)
on the head node

It works for me.
No jobs stuck on the queue anymore,
and the bells and whistles of cpuset are in place.

Now, gotta get PAM right.

Happy Holidays,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Garrick Staples wrote:
> On Mon, Dec 20, 2010 at 01:13:53AM -0500, Gustavo Correa alleged:
>> Dear Torque experts
>>
>> I am trying to configure and install  (and make it work) 
>> Torque 2.4.11 --with-pam on an x86_64 cluster.
>> I am a bit confused.  Please shed some light.
>>
>> 1) Configuring --with-pam seems to produce these libraries (also installed on the compute
>> nodes via the torque-pam package, I suppose):
>>
>> /lib64/security/pam_pbssimpleauth.a
>> /lib64/security/pam_pbssimpleauth.la
>> /lib64/security/pam_pbssimpleauth.so
>>
>> However, if I do only this, when I submit jobs they sit forever on the queue and don't start.
>>  
>> (Scheduling is enabled, server, scheduler are up in the head node, 
>> moms are up in the compute nodes, queue is enabled and started.)
> 
> pbs_simpleauth is for the compute nodes to allow users to login when they have
> a job running on that node. See src/pam/README.pam.
> 
> It has nothing to with the server or scheduling.
> 
>  
>> 2) By contrast, the documentation speaks about another package: 
>> pam_authuser (in the contrib directory),
>> and gives instructions on how to build it (via make+make install).
> 
> This is an older PAM module that does nearly the same thing in a different way.
> It requires that the prologue/epilogue scripts manage a list of usernames in
> /etc/authuser.
> 
>  
>> Make produces another library:  pam_authuser.so,
>> which the Makefile wants to install in /lib/security (NOT in /lib64/security).
>> I didn't do make install, because I expected the library to go to /lib64/security,
>> since my cluster is x86_64.
>> Right or wrong?
> 
> Correct. 64bit PAM libs should go into /lib64/security. Since pam_authuser is
> outside of the torque distribution, it doesn't benefit from torque's autoconf
> stuff. It was written before 64bit distros were common.
> 
>  
>> The Torque user's guide and the README files have  further instructions to install pam_authuser on the compute nodes, edit PAM security files, etc.
>> However, I stopped short of following procedure 2) all the way, 
>> because I was not sure if it would complement of conflict with procedure 1), or what else.
> 
> Do not use both PAM modules because they do the same thing. Obviously, I
> recommend the newer PAM module that is included in the torque distrobution.
> 
> 
>> Questions:
>>
>> Are the two approaches above complementary, independent, or conflicting?
> 
> Conflicting.
> 
>  
>> Should I use 1) only,  2) only,  or 1) + 2) ?
> 
> 1 or 2. Up to you.
> 
>   
>> How do I make Torque work with PAM, and the jobs run, instead of sitting forever in
>> the queue?
> 
> That's 2 questions. Jobs running has nothing to do with torque's pam support.
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list