[torqueusers] Node stays in busy state.. no cpu load.. reboot no
effect
Manian, Anand (GE Energy, Non GE)
anand.manian at ps.ge.com
Mon Dec 13 08:45:49 MST 2004
Hello folks,
Have a 48 node (96 CPUs) cluster running RedHat-7.3 and OpenPBS-2.3
(Patch level 2.1) with maui-3.2.6
A compute node this morning reported "busy" stat in-spite of there being
no jobs and its CPU-load being zero. Have tried
a) restarting pbs_mom on the node
b) changing $ideal_load and $max_load to ridiculously high numbers in
/usr/spool/PBS/mom_priv/config for that node, and re-starting pbs_mom
(both on that node).
c) Rebooting concerned node
Still no effect. I am seeing the following for this node,
when running pbsnodes -a :
n16
state = busy
np = 2
ntype = cluster
The curent CPU load is :
[root at n16 root]# uptime
10:44am up 1:18, 1 user, load average: 0.00, 0.00, 0.00
[root at n16 root]#
and mom config file is:
[root at n16 root]# cat /usr/spool/PBS/mom_priv/config
# Node that is allowed to connect to pbs_mom as long as it talks via
# a privileged port
$clienthost pbsserver
# "Low water-mark" for load; if load average drops below this value,
# that node is deemed to be **not** busy.
$ideal_load 1.5
# The "High water-mark" for load; if load average goes above this value,
# the node is marked "busy".
$max_load 1.6
# The setting for polling time
$poll_time 30
[root at n16 root]#
Has anyone seen this before..? Would appreciate any inputs.
Thanks
-Anand Manian
More information about the torqueusers
mailing list