[torqueusers] Sporadic UID errors
pregier at ittc.ku.edu
Fri Jun 22 13:14:12 MDT 2012
Sorry if this has been raised (there is another LDAP thread active but I think the problem is very different) before; I'm still going through the archives.
I'm trying to evaluate (stress test) Torque 3.0.5 and 4.0.4 for a possible upgrade from 2.x and have come across some odd behaviors. In particular, when I submit 1000 small jobs to a fake one-node cluster running Torque 3.0.5 and Maui 3.3.1 (built in-house as RPMs -- not by me, but I can retrieve specfiles etc. if that would help) and authenticated against LDAP, I tend to get 2-3 failed submissions (i.e., about 0.25% of my jobs never get accepted); for example:
qsub: Bad UID for job execution MSG=User pregier does not exist in server password file
This is just a loop; there is no difference between job 14291, 14293, and what should have been 14292.
Is this normal? Are there precautions to avoid it, or is this a bug I should be reporting in more detail?
Thanks for any suggestions; I'm not terribly experienced with Torque, so I'm not sure how quickly I should be bringing this sort of thing to the list. I can provide more details about my setup and/or stress tests, but didn't want to dump too much useless information in my first post.
Student assistant system admininstrator
University of Kansas, ITTC
More information about the torqueusers