[torqueusers] torque-2.0.0p8 submit job error

Hristo Iliev hristo at mc.phys.uni-sofia.bg
Fri Mar 24 10:08:45 MST 2006


On Fri, 2006-03-24 at 15:09 +0800, luxun wrote:
> Dear all,
> 
> I am trying torque-2.0.0p8 on RHEL 4 WS x86_64.
> There are 2 host in my testing environment.
> Host A is nfs server + yp server + torque server + torque scheduler.
> Host B is nfs client + yp client + torque client.
> In host B, can read/write the nfs server file system and yp user can
> login to host B.
> I login to torque server and submit job, the job exit_status is -1.
> There are some error messages in host B's mom_log as following:
> 03/24/2006 14:41:21;0100;   pbs_mom;Req;;Type QueueJob request
> received from PBS_Server at i160.ascc, sock=11
> 03/24/2006 14:41:21;0100;   pbs_mom;Req;;Type JobScript request
> received from PBS_Server at i160.ascc, sock=11 
> 03/24/2006 14:41:21;0100;   pbs_mom;Req;;Type ReadyToCommit request
> received from PBS_Server at i160.ascc, sock=11 
> 03/24/2006 14:41:21;0100;   pbs_mom;Req;;Type Commit request received
> from PBS_Server at i160.ascc, sock=11
> 03/24/2006 14:41:21;0001;   pbs_mom;Svr;pbs_mom;start_exec, job
> 10.i160.ascc check_pwd failed
> 03/24/2006 14:41:21;0008;   pbs_mom;Req;send_sisters;sending ABORT to
> sisters
> 03/24/2006 14:41:21;0100;   pbs_mom;Req;;Type StatusJob request
> received from PBS_Server at i160.ascc, sock=13 
> 03/24/2006 14:41:21;0100;   pbs_mom;Req;;Type CopyFiles request
> received from PBS_Server at i160.ascc, sock=11 
> 03/24/2006 14:41:21;0001;   pbs_mom;Svr;pbs_mom;Success (0) in
> fork_to_user, cannot find user 'wzlu' in password file
> 03/24/2006 14:41:21;0080;   pbs_mom;Req;req_reject;Reject reply
> code=15023(Bad UID for job execution REJHOST=i159.ascc MSG=cannot find
> user 'wzlu' in password file), aux=0, type=CopyFiles, from
> PBS_Server at i160.ascc
> 03/24/2006 14:41:21;0001;   pbs_mom;Svr;pbs_mom;Inappropriate ioctl
> for device (25) in req_cpyfile, fork_to_user failed with rc=-15023
> 'cannot find user 'wzlu' in password file' - returning failure
> 03/24/2006 14:41:21;0100;   pbs_mom;Req;;Type DeleteJob request
> received from PBS_Server at i160.ascc, sock=11 
> 
> pbs_mom can not find user. Have any idea?
> Thanks a lot. 

Looks like NIS (yp) issue to me. Can you see user 'wzlu' in the output
of 'ypcat passwd' command when run on host B?

H



More information about the torqueusers mailing list