[torqueusers] Torque 2.3.7 and connection Refused

Anil Thapa anilth at hi.is
Wed Jul 15 05:45:23 MDT 2009


Anil Thapa wrote:
> Hello Prakash,
>
> I am running into some LDAP problem. I added both headnode and 
> computer node into LDAP and home directories are also exported to 
> both. When i login to headnode with username i get the right home 
> directory (thats good and it is normal). When I submit the job with 
> the LDAP user it does submitted and distributed to the compute node 
> but nothing happens. Then mom_logs shows this :
>
>
> 10:38:54;0008;   pbs_mom;Job;130.bhairab.rhi.hi.is;attempting to copy 
> file 'bhairab.rhi.hi.is:/users/annad/peter/example.sh.o130'
> 07/14/2009 10:38:54;0001;   pbs_mom;Svr;pbs_mom;LOG_ERROR::Success (0) 
> in fork_to_user, cannot find user 'peter' in password file
> 07/14/2009 10:38:54;0080;   pbs_mom;Req;req_reject;Reject reply 
> code=15023(Bad UID for job execution REJHOST=305.rhi.hi.is MSG=cannot 
> find user 'peter' in password file), aux=0, type=CopyFiles, from 
> PBS_Server at bhairab.rhi.hi.is
> 07/14/2009 10:38:54;0001;   
> pbs_mom;Svr;pbs_mom;LOG_ERROR::Inappropriate ioctl for device (25) in 
> req_cpyfile, fork_to_user failed with rc=-15023 'cannot find user 
> 'peter' in password file' - returning failure
> 07/14/2009 10:38:54;0080;   pbs_mom;Req;dis_request_read;decoding 
> command DeleteJob from PBS_Server
>
> Then one of the LDAP user submits the job but keep pilling on 
> /var/spool/torque/underlivered.  Server logs doesn´t say much. My ldap 
> conf looks like this:
>
> headnode: /etc/ldap.conf
> uri ldap://neptune.hi.is/ ldaps://satrun.rhi.hi.is/
> ssl on
> tls_cacertdir /etc/openldap/cacerts
>
> Headnode: /etc/openldap/ldap.conf
> uri ldap://neptune.hi.is/ ldaps://satrun.rhi.hi.is/
> ssl on
> tls_cacertdir /etc/openldap/cacerts
>
> Headnode: /etc/nsswitch.conf
> passwd:     files ldap
> shadow:     files ldap
> group:      files ldap
>
> ethers:     files
> netmasks:   files
> networks:   files
> protocols:  files
> rpc:        files
> services:   files
> netgroup:   files ldap
> publickey:  nisplus
> automount:  files ldap
> aliases:    files nisplus
>
>
> These configuration are identical to compute node. Any hint or input 
> would be great.
>
> Anil
>
>>> Hello
>>>
>>> Thanks for you help. I actually remove all the torque from server 
>>> and client then rebuild with ./configure ---with- scp for both head 
>>> node and client. jobs sumission was working fine but jobs were 
>>> always in /var/spool/torque/underliverd directory. then  I followed 
>>> 6.1.5 - Enabling Bi-Directional SCP Access from cluster resources 
>>> "http://www.clusterresources.com/products/torque/docs/6.1scpsetup.shtml". 
>>> I created the identical users locally in both head node and computer 
>>> node with same uid. Then it worked as it supposed to. jobs are sent 
>>> to compute node and result back to users /home/user directory. At 
>>> least this looks working but this is not an ideal way.
>>>
>>> It would be ideal I don´t have to create user and its home directory 
>>> for very users. I was thinking adding every compute node to LDAP 
>>> server and export their home directory as my head node. What is your 
>>> thought in this.
>>
>> It is supposed to work that way if LDAP and automounter are 
>> configured correctly and your home directory server's export list 
>> allows head node and compute nodes to mount the relevant directory 
>> with the required permissions.
>>
>>>
>>> However still user have to ssh-keygen -t rsa bi-directionally in 
>>> order to compute node could send the result back (or  are there any 
>>> better option) ?
>>
>> Not needed. Please see
>> usecp
>> directive in PBS Mom's configuration file. As the home directories 
>> are uniform and automounted across the whole cluster, MOM needs to 
>> just cp the output and error files instead of using any kind of 
>> remote copy.
>>
>>>
>>> Thanks and have a good weekend.
>>>
>>> A
>>
>> Prakash
>
>



More information about the torqueusers mailing list