[torqueusers] Sporadic UID errors

Phil Regier pregier at ittc.ku.edu
Fri Jun 22 14:48:19 MDT 2012


Oops.  An error and an omission:  I meant 4.0.2 instead of 4.0.4 (trying 4.0.3 snapshot now), and it should also be noted that as part of the stress test I am constantly watching repeated qstats.  The problem does not seem to appear with 4.0.x as such; might this be related to the switch from a single-threaded server to multi-threaded?

----- Original Message -----
From: "Phil Regier" <pregier at ittc.ku.edu>
To: torqueusers at supercluster.org
Sent: Friday, June 22, 2012 2:14:12 PM
Subject: Sporadic UID errors

Sorry if this has been raised (there is another LDAP thread active but I think the problem is very different) before; I'm still going through the archives.

I'm trying to evaluate (stress test) Torque 3.0.5 and 4.0.4 for a possible upgrade from 2.x and have come across some odd behaviors.  In particular, when I submit 1000 small jobs to a fake one-node cluster running Torque 3.0.5 and Maui 3.3.1 (built in-house as RPMs -- not by me, but I can retrieve specfiles etc. if that would help) and authenticated against LDAP, I tend to get 2-3 failed submissions (i.e., about 0.25% of my jobs never get accepted); for example:

...
14289.localhost
14290.localhost
14291.localhost
qsub: Bad UID for job execution MSG=User pregier does not exist in server password file

14293.localhost
14294.localhost
14295.localhost
...


This is just a loop; there is no difference between job 14291, 14293, and what should have been 14292.

Is this normal?  Are there precautions to avoid it, or is this a bug I should be reporting in more detail?

Thanks for any suggestions; I'm not terribly experienced with Torque, so I'm not sure how quickly I should be bringing this sort of thing to the list.  I can provide more details about my setup and/or stress tests, but didn't want to dump too much useless information in my first post.

Phil Regier
Student assistant system admininstrator
University of Kansas, ITTC


More information about the torqueusers mailing list