[torqueusers] Torque : pbs_mom stuck with "no password entry for user <someuser>" message

Pascal Mayani pascal.mayani at eurecom.fr
Tue Dec 7 14:21:03 MST 2010


Henri Marsalet <henri.marsalet <at> yahoo.fr> writes:

> 
> Hi,
> 
> I run a 256 nodes PBS cluster using the latest Torque 2.5.3, under Linux 
> Fedora Core 6 with 2.6.22-9 kernel. Users are authenticated by a LDAPS server 
> with the native pam_ldap module.
> 
> Most of the time the system is working flawlessly, but sometimes, on a 
> perfectly random basis, ALL nodes stop accepting jobs from the PBS server. 
> Each time a job is submitted, the following error pops up in the node's 
> syslog :
> 
> pbs_mom: LOG_ERROR::start_exec, no password entry for user <name of the user>

Hi Henri,

I had exactly the same problem on my PBS cluster.

It turned out the client-side packages were compiled in 32-bit mode on the
master node, and then distributed and installed to the compute nodes which run
on a 64-bit version of Linux... Calling some functions in such a case, including
getpwnam(), can sometimes result in weird behaviours.

This could be a fairly common mistake, because the master node has no need for
large memory and consequently often runs on a 32-bit platform.

You can see the MOM's library dependencies with the ldd command. On the 32-bit
platform :

ldd /usr/local/sbin/pbs_mom
        linux-gate.so.1 =>  (0xffffe000)
        libutil.so.1 => /lib/libutil.so.1 (0x471ad000)
        libtorque.so.2 => /usr/local/lib/libtorque.so.2 (0xf7edb000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x471dc000)
        libc.so.6 => /lib/libc.so.6 (0x4706e000)
        /lib/ld-linux.so.2 (0x47051000)

And on the 64-bit :

ldd /usr/local/sbin/pbs_mom
        linux-vdso.so.1 =>  (0x00007fff689fd000)
        libutil.so.1 => /lib64/libutil.so.1 (0x0000003eaa200000)
        libtorque.so.2 => /usr/local/lib64/libtorque.so.2 (0x00002ad9421e0000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003e9c800000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003e9bc00000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003e9b800000)

So have a look on this first. I guess your test program is irrelevant if it's
linked to the local 64-bit libraries whereas the pbs_mom is still linked to the
32-bit ones...

Pascal




More information about the torqueusers mailing list