[torqueusers] Torque losing LDAP connection

Corey Hirschman corey at rentec.com
Thu Apr 21 15:52:44 MDT 2005


I was wondering if there is anyone else using Torque in and LDAP environment that has maybe dealt with this issue.  Every few days we get error messages saying the pbs_server process could not connect to the LDAP server.  Here is an example from today:

Apr 21 16:39:23 monsterrq pbs_server: nss_ldap: reconnecting to LDAP server...
Apr 21 16:39:23 monsterrq pbs_server: nss_ldap: reconnecting to LDAP server...
Apr 21 16:39:23 monsterrq pbs_server: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)...
Apr 21 16:39:27 monsterrq pbs_server: nss_ldap: reconnecting to LDAP server (sleeping 8 seconds)...
Apr 21 16:39:35 monsterrq pbs_server: nss_ldap: reconnecting to LDAP server (sleeping 16 seconds)...
Apr 21 16:39:51 monsterrq pbs_server: nss_ldap: reconnecting to LDAP server (sleeping 32 seconds)...
Apr 21 16:40:23 monsterrq pbs_server: nss_ldap: could not hard reconnect to LDAP server - Can't contact LDAP server

During this time no one can submit jobs and the Torque server is basically dead in the water.  Everything else on the machine will work normally and I can still make LDAP queries and they return fine.  The servers themselves are also up and experience no problems, the problems appear to be isolated to the pbs_server process.  After about 15 minutes everything starts working normally again.  Sometimes this 15 minute wait is unacceptable and I have to restart the server and after that everything works fine again.

Is anyone else experiences problems such as this or perhaps has a clue as to why the pbs_server process is losing communication with the LDAP server?

Thank you,

Corey Hirschman
Renaissance Technologies


More information about the torqueusers mailing list