[torquedev] Question about IamUserByName, Win32 mom
Sean.Kellogg at fei.com
Wed Feb 22 11:43:55 MST 2012
Dear Torque development community,
I'm trying to troubleshoot a PBS error that I have on a Win32 execute host. I have run into a dead end.
The symptoms are as follows: after a job is queued to the Torque system, the job is passed to a Win32 execute host and then exits with Exit_status=-1. The PBS mom log on the execute host contains the following seven lines as a record of the failure:
02/20/2012 07:57:02;0001; pbs_mom;Svr;pbs_mom;LOG_ERROR::IamUserByName, WARNING!!! Can`t find user "simuser"!
02/20/2012 07:57:02;0001; pbs_mom;Svr;pbs_mom;LOG_ERROR::start_exec, Torque Mom Version = 2.5.4, loglevel = 0
02/20/2012 07:57:02;0008; pbs_mom;Req;send_sisters;sending ABORT to sisters for job 2777.hl-vcomputenodemaster.DOMAIN.COMPANYNAME.com
02/20/2012 07:57:05;0080; pbs_mom;Svr;preobit_reply;top of preobit_reply
02/20/2012 07:57:05;0080; pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of while loop
02/20/2012 07:57:05;0080; pbs_mom;Svr;preobit_reply;in while loop, no error from job stat
02/20/2012 07:57:05;0080; pbs_mom;Job;2777.hl-vcomputenodemaster. DOMAIN.COMPANYNAME.com;obit sent to server
I suspect that the most interesting information is contained in the first line, following the function name IamUserByName. I have found limited information about this error, and it's all contained in the torquedev mailing list thread 2173 (http://www.supercluster.org/pipermail/torquedev/2010-June/002173.html)
To make matters more confusing, this problem only occurs on about 80% of the job submissions; the other 20% are executed normally. Therefore I wonder if there is a reliability issue with the function call IamUserByName.
Can anyone provide any insight?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torquedev