[torqueusers] More pbs_mom communication problems

Hannu Väisänen hvaisane at joyx.joensuu.fi
Mon Feb 28 04:00:50 MST 2005


On server log I get

PBS_Server;Svr;check_nodes;node xxxxx not detected in 1152 seconds, marking node down


On node log I get

pbs_mom;Svr;pbs_mom;No child processes (10) in is_update_stat, cannot specify protocol
pbs_mom;Svr;pbs_mom;im_eof, Premature end of message from addr nnn.nnn.nnn.nnn:15001
                                            That's the server ================


When I do

telnet server 15001

on the node I get No route to host.

ssh to and from the node works.



On both the server and the node, iptables-save says

-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 15001 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 15004 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 15003 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 15003 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 15002 -j ACCEPT


pbsnodes -a on the server says the node is down.


Any ideas how to continue?


More information about the torqueusers mailing list