[torqueusers] Nodes not showing correct state

Winfried Lorenzen winfried.lorenzen at uni-rostock.de
Fri Feb 15 08:56:42 MST 2013


Hi, 
I have seen the same behavior, I have not found a solution yet. Our maui 
logfiles show

MPBSClusterQuery(...,RCount,SC)
ERROR:    cannot get node info: End of File
ALERT:    cannot load cluster resources on RM (RM '...' failed in function 
'clusterquery')
WARNING:  no resources detected

(maybe you have to increase the debuglevel to see that)

We have tried torque 4.1.2 to 4.1.4 with the same result.


W. Lorenzen


Am Mittwoch, 13. Februar 2013, 18:55:28 schrieb Moye, Roger V:

We are running torque 4.1.2 and maui 3.3.1.
 
When I run “diagnose –n” I see that all of my nodes report:
WARNING:  node ‘nodename’ has not been updated in 00:21:52.
 
I believe this is because Maui is not getting updated node status from 
torque.  But I’m not sure if this is a torque problem or maui problem so I’m 
posting to both lists.   All of the nodes are online and are responsive.  In 
many cases they are idle.  
 
If I wait long enough (maybe hours) the problem will resolve itself but 
reappear a short while later.  If I reset maui then the problem is resolved 
but will reappear a short while later.
 
As a result of this problem we often have maui thinking that nodes are busy 
even while they are idle.  So jobs wait in the queue even when nodes are idle.
 
Has anyone seen this problem before?
 
Thanks!
-Roger Moye
 
-----------------------------------------------------------
Roger V. Moye
Systems Analyst III
University of Texas MD Anderson Cancer Center
Division of Quantitative Sciences
FCT4.6109
Houston, Texas
-----------------------------------------------------------
  


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130215/83736519/attachment.html 


More information about the torqueusers mailing list