[torqueusers] Server not talking to MOMs at all

Prakash Velayutham velayups at email.uc.edu
Mon Aug 15 16:28:37 MDT 2005


Garrick Staples wrote:

>On Mon, Aug 15, 2005 at 05:11:19PM -0400, Prakash Velayutham alleged:
>  
>
>>Garrick Staples wrote:
>>
>>    
>>
>>>On Mon, Aug 15, 2005 at 03:24:41PM -0400, Prakash Velayutham alleged:
>>>
>>>
>>>      
>>>
>>>>Here is the output of momctl -d 4 -h yy.yy.yy.yy (on the compute node):
>>>>  
>>>>
>>>>        
>>>>
>>>Does this work from the server?  Anything interesting in server's log 
>>>files?
>>>
>>>      
>>>
>>Hi Garrick,
>>
>>This is what I get from the server
>>
>>Host: xylose/xylose.dmzcluster.cchmc.org   Server: fructose   Version: 
>>torque_1.2.0p5
>>HomeDirectory:          /var/spool/torque/mom_priv
>>MOM active:             6223 seconds
>>WARNING:  no messages received from server
>>Last Msg To Server:     20 seconds
>>Server Update Interval: 20 seconds
>>WARNING:  no hello/cluster-addrs messages received from server
>>Init Msgs Sent:         624 hellos
>>LOGLEVEL:               0 (use SIGUSR1/SIGUSR2 to adjust)
>>Communication Model:    RPP
>>TCP Timeout:            20 seconds
>>Prolog Alarm Time:      300 seconds
>>Alarm Time:             0 of 10 seconds
>>Trusted Client List:    192.168.1.254,205.142.199.176,192.168.1.51,127.0.0.1
>>JobList:                NONE
>>
>>diagnostics complete
>>
>>Nothing strange at all in the server logs.
>>    
>>
>
>From that Trusted Client List, I'm making the following assumptions:
>  xylose's IP is 192.168.1.51
>  fructose has two interfaces: 192.168.1.254 and 205.142.199.176.
>  xylose doesn't have access to your "live" 205.142 network and you intend for
>    all cluster traffic to be on the 192.168 network.
>
>Verify that the first $clienthost in your mom config resolves to 192.168.1.254
>with matching forward and reverse.
>
>Also verify that the names in $PBSHOME/server_priv/nodes resolves to
>192.168.1.51 with matching forward and reverse.
>
>Either MOM is sending HELLOs to the wrong IP, the HELLOs are blocked in some
>port filtering or firewalling, or server is sending INIT messages back to the
>wrong place.  My suspicion is the first possibility.
>
>Crank the loglevels on mom and server all the way up to 9.  MOM will log where
>it is sending HELLOs, server will log who it got HELLOs from.
>
Earlier, my config file was
#########################
$restricted     192.168.1.254
$restricted     fructose.cchmc.org
$restricted     205.142.199.176
$restricted     transferase.dmzcluster.cchmc.org
$logevent       255

$clienthost     fructose
$clienthost  transferase
$clienthost  xylose
##############################

When I changed it to,
###################################
$restricted     192.168.1.254
$restricted     transferase.dmzcluster.cchmc.org
$restricted     205.142.199.176
$restricted     fructose.cchmc.org
$logevent       255

$clienthost  transferase
$clienthost     fructose
$clienthost  xylose
####################################

everything started to work. Here, as you had guessed, fructose is the 
name for the 205.142.199.* interface (public). transferase is the name 
for the 192.*.*.* interface (private).  Why is this? Which of these 
lines made the difference? I am dazzled. Could you explain it please? My 
Masquerade is still working fine, just to note.

Thanks a lot,
Prakash


More information about the torqueusers mailing list