[torqueusers] Server not talking to MOMs at all
velayups at email.uc.edu
Mon Aug 15 13:24:41 MDT 2005
I have just a 1 node + 1 server system. The server on the server system
starts up just fine and the MOM starts up on the compute node just fine.
But there is no communication between the 2. The strange thing is that
the 2 were talking temporarily for almost half a day sometime last
wednesday. When I restarted the MOM and server (for adding more nodes),
all the nodes now show up as state-unknown,down in "pbsnodes". Even
after I removed the newly added nodes, things are not going back to normal.
Here is the output of momctl -d 4 -h yy.yy.yy.yy (on the compute node):
Host: xylose/xylose.dmzcluster.cchmc.org Server: fructose Version:
MOM active: 15756 seconds
WARNING: no messages received from server
Last Msg To Server: 0 seconds
Server Update Interval: 20 seconds
WARNING: no hello/cluster-addrs messages received from server
Init Msgs Sent: 1581 hellos
LOGLEVEL: 0 (use SIGUSR1/SIGUSR2 to adjust)
Communication Model: RPP
TCP Timeout: 20 seconds
Prolog Alarm Time: 300 seconds
Alarm Time: 0 of 10 seconds
Trusted Client List: 192.168.1.254,188.8.131.52,192.168.1.51,127.0.0.1
When server server daemon starts also I don't see it to find that the
node is available. It just does not see it.
As a sidenote, I noticed someone in the list saying that netfilter
iptables might cause this. I have Masquerading set on the server. So
would it affect this?
Any help greatly appreciated.
More information about the torqueusers