[torqueusers] pbs_mom no password entry for user

Mary Ellen Fitzpatrick mfitzpat at bu.edu
Thu Jul 31 14:15:39 MDT 2008


Hi,
I have installed/configured torque-2.3.1 and maui-3.2.6p18 on my head 
node, nona-man.   I thought I had everything configured correctly, but 
apparantly not.


When I submit a test job, I get the following error on compute node1003
07/31/2008 15:52:55;0008;   pbs_mom;Job;process_request;request type 
CopyFiles from host nona-man allowed
07/31/2008 15:52:55;0008;   pbs_mom;Job;11.nona-man;attempting to copy 
file 'nona-man:/home/mef/test3.out'
07/31/2008 15:52:55;0001;   pbs_mom;Svr;pbs_mom;Success (0) in 
fork_to_user, cannot find user 'mef' in password file
07/31/2008 15:52:55;0080;   pbs_mom;Req;req_reject;Reject reply 
code=15023(Bad UID for job execution REJHOST=node1003 MSG=cannot find 
user 'mef' in password file), aux=0, type=CopyFiles, from 
PBS_Server at nona-man

But I can run ypcat passwd on node1003 and the account mef is listed.  
Not sure why I am getting that error.

Also, the error about Bad UID for job execution, my qmgr lists acl_hosts 
and server submit hosts as:
set server acl_host_enable = True
set server acl_hosts = *
set server acl_hosts += nona-man
set server submit_hosts += nona-man

/var/log/messages on the head node has the following errors:
Jul 31 15:52:20 nona-man PBS_Server: stream_eof, connection to node1003 
is bad, remote service may be down, message may be corrupt, or 
connection may have been dropped remotely (End of File).  setting node 
state to down
Jul 31 15:52:50 node1003 pbs_mom: start_exec, no password entry for user 
mef
Jul 31 15:52:50 nona-man kernel: maui[20200]: segfault at 
0000000000000694 rip 0000000000456a21 rsp 000007fbff4bb00 error 4
Jul 31 15:52:50 node1003 pbs_mom: open_std_file, cannot determine filename
Jul 31 15:52:55 node1003 pbs_mom: open_std_file, cannot determine filename
Jul 31 15:52:55 node1003 pbs_mom: Success (0) in fork_to_user, cannot 
find user 'mef' in password file
Jul 31 15:52:55 node1003 pbs_mom: Inappropriate ioctl for device (25) in 
req_cpyfile, fork_to_user failed with rc=-15023 'cannot find user 'mef' 
in password file' - returning failure

when run pbsnodes -a, node1003 is listed as "free".

And just in case that is not enough, maui crashes as indicated by the 
segfault above.

any advice/help would be great


-- 
Thanks
Mary Ellen



More information about the torqueusers mailing list