[torqueusers] job failure- cannot find user in password file

Hatem Elshazly hmelshazly at gmail.com
Fri Jun 21 07:07:28 MDT 2013


Ok, i've worked it out.
Make sure that every machine in the cluster (head node + computing nodes)
have the same username and uid (user id), That is, when you submit the job
from user (X) on the head node then a user (X) must exist on every other
machine in the cluser.

Thanks,
shazly



On Mon, Jun 17, 2013 at 6:39 PM, shazly <hmelshazly at gmail.com> wrote:

> Hi there guys,
>
> I'm having a problem with pbs i wish anyone can help me out with it.
>
> first Here are some helping info:
> server=hatem-Inspiron-5520
> client=toma-VirtualBox
> shazly= a user on the server
> job id=9.hatem-Inspiron-5520
>
> So i installed and configured pbs torque on a mini-cluster (server + one
> client), when i submit a job from the server, i don't get the output files,
> so i went to check the mom log on the client machine and i found these
> entries:
>
> pbs_mom;Svr;mom_server_add;server hatem-Inspiron-5520 added
> pbs_mom;Svr;pbs_mom;LOG_ALERT::mom_server_valid_message_source, bad connect
> from "the server ip"- unauthorized server
> mom_server_check_connection;sending hello to server 'hatem-Inspiron-5520'
> pbs_mom;LOG_ERROR::start_exec, no password entry for user 'shazly'
> pbs_mom;Req;send_sisters;sending ABORT to sisters for job
> '9.hatem-Inspiron-
> 5520'
> pbs_mom;Svr;pbs_mom;LOG_ERROR::sucess(0) in fork_to_user, cannot find
> 'shazly' in password file
> pbs_mom;Req;req_reject;Reject reply code=15025(BAD UID for job execution
> REJHOST=toma-virtualbox MSG=cannot find user 'shazly' in password file),
> aux=0, type=CopyFiles, from PBS_Server at hatem-Inspiron-5520
> pbs_mom;Svr;pbs_mom;LOG_ERROR::Inappropriate ioctl for device (25) in
> req_cpyfile, fork_to_user failed with rc=-15025 'cannot find user 'shazly'
> in password file'-returning failure
> pbs_mom;Job;removed job script
>
> Also when i run "qstat -f" on the server afer submitting the job i get:
> sched_hint=Post Job file processing error; job 9.hatem-Inspiron-5520 on
> host
> toma-VirtualBox/0 BAD UID for job execution REJHOST=toma-virtualBox
> MSG=cannot find user 'shazly' in password file
> exit_status=-1
>
> Everything in /etc/hosts is fine and i can ssh from server to client
> passwordless and vice-versa and i can ping both ips. I'm frustrated here so
> any help is appreciated.
>
> Thanks
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130621/0a706314/attachment-0001.html 


More information about the torqueusers mailing list