[torqueusers] Torque 2.3.7 and connection Refused
Anil Thapa
anilth at hi.is
Wed Jul 15 05:45:23 MDT 2009
Anil Thapa wrote:
> Hello Prakash,
>
> I am running into some LDAP problem. I added both headnode and
> computer node into LDAP and home directories are also exported to
> both. When i login to headnode with username i get the right home
> directory (thats good and it is normal). When I submit the job with
> the LDAP user it does submitted and distributed to the compute node
> but nothing happens. Then mom_logs shows this :
>
>
> 10:38:54;0008; pbs_mom;Job;130.bhairab.rhi.hi.is;attempting to copy
> file 'bhairab.rhi.hi.is:/users/annad/peter/example.sh.o130'
> 07/14/2009 10:38:54;0001; pbs_mom;Svr;pbs_mom;LOG_ERROR::Success (0)
> in fork_to_user, cannot find user 'peter' in password file
> 07/14/2009 10:38:54;0080; pbs_mom;Req;req_reject;Reject reply
> code=15023(Bad UID for job execution REJHOST=305.rhi.hi.is MSG=cannot
> find user 'peter' in password file), aux=0, type=CopyFiles, from
> PBS_Server at bhairab.rhi.hi.is
> 07/14/2009 10:38:54;0001;
> pbs_mom;Svr;pbs_mom;LOG_ERROR::Inappropriate ioctl for device (25) in
> req_cpyfile, fork_to_user failed with rc=-15023 'cannot find user
> 'peter' in password file' - returning failure
> 07/14/2009 10:38:54;0080; pbs_mom;Req;dis_request_read;decoding
> command DeleteJob from PBS_Server
>
> Then one of the LDAP user submits the job but keep pilling on
> /var/spool/torque/underlivered. Server logs doesn´t say much. My ldap
> conf looks like this:
>
> headnode: /etc/ldap.conf
> uri ldap://neptune.hi.is/ ldaps://satrun.rhi.hi.is/
> ssl on
> tls_cacertdir /etc/openldap/cacerts
>
> Headnode: /etc/openldap/ldap.conf
> uri ldap://neptune.hi.is/ ldaps://satrun.rhi.hi.is/
> ssl on
> tls_cacertdir /etc/openldap/cacerts
>
> Headnode: /etc/nsswitch.conf
> passwd: files ldap
> shadow: files ldap
> group: files ldap
>
> ethers: files
> netmasks: files
> networks: files
> protocols: files
> rpc: files
> services: files
> netgroup: files ldap
> publickey: nisplus
> automount: files ldap
> aliases: files nisplus
>
>
> These configuration are identical to compute node. Any hint or input
> would be great.
>
> Anil
>
>>> Hello
>>>
>>> Thanks for you help. I actually remove all the torque from server
>>> and client then rebuild with ./configure ---with- scp for both head
>>> node and client. jobs sumission was working fine but jobs were
>>> always in /var/spool/torque/underliverd directory. then I followed
>>> 6.1.5 - Enabling Bi-Directional SCP Access from cluster resources
>>> "http://www.clusterresources.com/products/torque/docs/6.1scpsetup.shtml".
>>> I created the identical users locally in both head node and computer
>>> node with same uid. Then it worked as it supposed to. jobs are sent
>>> to compute node and result back to users /home/user directory. At
>>> least this looks working but this is not an ideal way.
>>>
>>> It would be ideal I don´t have to create user and its home directory
>>> for very users. I was thinking adding every compute node to LDAP
>>> server and export their home directory as my head node. What is your
>>> thought in this.
>>
>> It is supposed to work that way if LDAP and automounter are
>> configured correctly and your home directory server's export list
>> allows head node and compute nodes to mount the relevant directory
>> with the required permissions.
>>
>>>
>>> However still user have to ssh-keygen -t rsa bi-directionally in
>>> order to compute node could send the result back (or are there any
>>> better option) ?
>>
>> Not needed. Please see
>> usecp
>> directive in PBS Mom's configuration file. As the home directories
>> are uniform and automounted across the whole cluster, MOM needs to
>> just cp the output and error files instead of using any kind of
>> remote copy.
>>
>>>
>>> Thanks and have a good weekend.
>>>
>>> A
>>
>> Prakash
>
>
More information about the torqueusers
mailing list