[torqueusers] Node stays in busy state.. no cpu load.. reboot no effect

Manian, Anand (GE Energy, Non GE) anand.manian at ps.ge.com
Mon Dec 13 08:45:49 MST 2004

Hello folks,

   Have a 48 node (96 CPUs) cluster running RedHat-7.3 and OpenPBS-2.3
(Patch level 2.1) with maui-3.2.6

   A compute node this morning reported "busy" stat in-spite of there being
no jobs and its CPU-load being zero. Have tried 

a) restarting pbs_mom on the node
b) changing   $ideal_load and $max_load to ridiculously high numbers in 
   /usr/spool/PBS/mom_priv/config for that node, and re-starting pbs_mom 
   (both on that node).
c) Rebooting concerned node

Still no effect. I am seeing the following for this node,
when running pbsnodes -a : 

     state = busy
     np = 2
     ntype = cluster

The curent CPU load is :

[root at n16 root]# uptime
 10:44am  up  1:18,  1 user,  load average: 0.00, 0.00, 0.00
[root at n16 root]# 

and mom config file is:

[root at n16 root]# cat /usr/spool/PBS/mom_priv/config 
# Node that is allowed to connect to pbs_mom as long as it talks via
# a privileged port
$clienthost pbsserver

# "Low water-mark" for load; if load average drops below this value,
# that node is deemed to be **not** busy.
$ideal_load 1.5

# The "High water-mark" for load; if load average goes above this value,
# the node is marked "busy".
$max_load   1.6

# The setting for polling time 
$poll_time   30
[root at n16 root]# 

Has anyone seen this before..? Would appreciate any inputs.


-Anand Manian

