[torqueusers] Server not talking to MOMs at all

Prakash Velayutham velayups at email.uc.edu
Mon Aug 15 13:24:41 MDT 2005


Hi All,

I have just a 1 node + 1 server system. The server on the server system 
starts up just fine and the MOM starts up on the compute node just fine. 
But there is no communication between the 2. The strange thing is that 
the 2 were talking temporarily for almost half a day sometime last 
wednesday. When I restarted the MOM and server (for adding more nodes), 
all the nodes now show up as state-unknown,down in "pbsnodes". Even 
after I removed the newly added nodes, things are not going back to normal.

Here is the output of momctl -d 4 -h yy.yy.yy.yy (on the compute node):

Host: xylose/xylose.dmzcluster.cchmc.org   Server: fructose   Version: 
torque_1.2.0p5
HomeDirectory:          /var/spool/torque/mom_priv
MOM active:             15756 seconds
WARNING:  no messages received from server
Last Msg To Server:     0 seconds
Server Update Interval: 20 seconds
WARNING:  no hello/cluster-addrs messages received from server
Init Msgs Sent:         1581 hellos
LOGLEVEL:               0 (use SIGUSR1/SIGUSR2 to adjust)
Communication Model:    RPP
TCP Timeout:            20 seconds
Prolog Alarm Time:      300 seconds
Alarm Time:             0 of 10 seconds
Trusted Client List:    192.168.1.254,205.142.199.176,192.168.1.51,127.0.0.1
JobList:                NONE

diagnostics complete

When server server daemon starts also I don't see it to find that the 
node is available. It just does not see it.
As a sidenote, I noticed someone in the list saying that netfilter 
iptables might cause this. I have Masquerading set on the server. So 
would it affect this?

Any help greatly appreciated.

Thanks,
Prakash


More information about the torqueusers mailing list