[torqueusers] node stauts is down, but node is normal : Some things I check when this happens.
Coyle, James J [ITACD]
jjc at iastate.edu
Wed Apr 21 08:46:33 MDT 2010
Some things I check when something like this happens.
Login to the compute node and check /var/spool/torque/server_name to make
sure that it contains the name of the head node (i.e. the node running pbs_server)
If the name is correct, try ssh from the compute node back to the head node
(/etc/resolv.conf and/or /etc/hosts may be incorrect and need to be loaded
from another compute node.)
Check that the compute node is in the nodes file. You may need to restart
pbs_server if you just added it.
Check firewalls especially on the head node (PBS ports may be open only for
a range of addresses.
Make sure that pbs_server is not running on the compute node.
Check that /var/spool/torque/mom_priv/config is just like the other compute nodes.
Lastly you may need to restart the pbs_mom in case it has gotten corrupted.
James Coyle, PhD
High Performance Computing Group
115 Durham Center
Iowa State Univ.
Ames, Iowa 50011 web: http://www.public.iastate.edu/~jjc
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Zhang Yang
Sent: Wednesday, April 21, 2010 1:43 AM
To: Torque Users
Subject: [torqueusers] node stauts is down, but node is normal
Our cluster install torque 2.0.0 and maui 3.2.6, everything is ok! But I found a strange node, when I use 'diagnose -n' show the node status is down. but I can ssh access to the
node, pbs_mom is running. anybody meet this problem? or give me some suggestion. Thanks!
Lan Zhou University
Email：zhyang at lzu.edu.cn
torqueusers mailing list
torqueusers at supercluster.org
More information about the torqueusers