[torqueusers] node stauts is down, but node is normal : Some things I check when this happens.

Coyle, James J [ITACD] jjc at iastate.edu
Wed Apr 21 08:46:33 MDT 2010


   Some things I check when something like this happens.

   Login to the compute node and check /var/spool/torque/server_name  to make 
sure that it contains the name of the head node (i.e. the node running pbs_server)

   If the name is correct, try ssh from the compute node back to the head node
(/etc/resolv.conf and/or /etc/hosts may be incorrect and need to be loaded
from another compute node.)

   Check that the compute node is in the nodes file.  You may need to restart
pbs_server if you just added it.

   Check firewalls especially on the head node (PBS ports may be open only for
a range of addresses.

   Make sure that pbs_server is not running on the compute node.  

   Check that /var/spool/torque/mom_priv/config  is just like the other compute nodes.

   Lastly you may need to restart the pbs_mom in case it has gotten corrupted. 

Good luck,

 James Coyle, PhD
 High Performance Computing Group     
 115 Durham Center            
 Iowa State Univ.           
 Ames, Iowa 50011           web: http://www.public.iastate.edu/~jjc

-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Zhang Yang
Sent: Wednesday, April 21, 2010 1:43 AM
To: Torque Users
Subject: [torqueusers] node stauts is down, but node is normal


   Our cluster install torque 2.0.0 and maui 3.2.6, everything is ok! But I found a strange node, when I use 'diagnose -n' show the node status is down. but  I can ssh access to the  
node, pbs_mom is running.  anybody meet this problem? or give me some suggestion. Thanks!




   Lan Zhou University
  Email:zhyang at lzu.edu.cn
torqueusers mailing list
torqueusers at supercluster.org

More information about the torqueusers mailing list