[torqueusers] Problems with installing torque

Bas van der Vlies basv at sara.nl
Wed Sep 27 00:51:22 MDT 2006


axel wrote:
> Am Samstag 23 September 2006 01:49 schrieb Garrick Staples:
>> On Fri, Sep 22, 2006 at 11:23:11AM +0200, axel alleged:
>>>>> pbs_sched             15004/tcp
>>>> That's a good sign.  It is actually listening on 15004.
>>>>
>>>>> Hmm ... interesting ...  that it couldn't connect ...
>>>> If it happens once at start up, no biggie.  Is it happening every
>>>> minute or so (or whatever the server's scheduler_iteration is set to),
>>>> then something weird is happening.  Sure there is no port filtering? 
>>>> SELinux rules?
>>> It's happining every 10 Minutes.
>>> The firewall is stoped, no SELinux rules.
>>>
>>> Installed is a opensuse 10.0, kernelversion: 2.6.13-15.10-default
>>>
>>> The pbs_mom also could connectto the server without any problems.
>>>
>>> Could it be, that the schedular is listining on a special
>>> ethernet-adapter ? There are 3 ethernet-adapters in the computer.
>> As you can see in the lsof output, the scheduler is listening on the IP
>> that the hostname resolved to, which should be the same name used by
>> pbs_server.  Did you redefine the hostname in pbs_server's config?
>>
> 
> 
> Ok .. there i found the problem and a solution.
> The pbs_srver and pbs_sched use different netword adapters.
> 
> Now i have another problem ...  i see it but don't really know how to solve 
> it.
> 
> pbs_mom;Svr;pbs_mom;im_request, bad connect from 192.168.1.2:1022 - 
> unauthorized (okclients: 192.168.0.1, ...
> 
> The problem is, that the nodes have 2 network adapters, one for 
> calculating-communication  and one for files-transmission to the frontend.
> 
> The frontend knows the nodes with the name node and the ip 192.168.0.*
> On the nodes this network is called slow. That means: node02 connect to node01 
> over this network, than he has to connect to slow01 (192.168.0.1).   
> But if the node02 connects to node01 via the address node01, than it use the 
> ip 192.168.1.1.
> 
> Is there a solution to fix it ?
> 

We have the same setup one slow and fast network. There are two possible 
solutins:
  1- Use the fast network names as node names, then the frontend node 
must also be connected to the fast network.
  1- If the frontend is on a different network or you want to use the 
slow network for communications. Use the slow network names and if there 
is an mpi you can translate the hostname to the fast hostnames in a 
prologue scriptt or use mpiexec that also can do some hostname remapping.

Regards

--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv at sara.nl      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************


More information about the torqueusers mailing list