[torqueusers] downing a node via qmgr
Stewart.Samuels at sanofi-aventis.com
Stewart.Samuels at sanofi-aventis.com
Wed Sep 21 10:22:03 MDT 2005
I have just experienced strange behaviour with qmgr. We currently have a node which is rebooting itself constantly. To take the system out of the cluster to diagnose the problem, I have specify the following command:
qmgr -c 's n node-name state=down'
For a few moments, once the qmgr command is issued, subsequent "pbsnode -a" commands show node-name "down". But for some reason, it then shows the node as "free" again.
Has anyone seen this behaviour? I realize we are running a little behind with the patch level, but we are running torque-1.2.0p1 and maui-3.2.6p11.
When there is such a failure (this has occurred a few times in our cluster), is there a way (other than qmgr) of temporarily removing nodes in which deleting the node in the server_prive/nodes file and restarting pbs_server using the "-t create" argument is not necessary?
Stewart Samuels
Infrastructure Evolution and Integration
Scientific and Medical Affairs
Sanofi-Aventis Pharmaceutical
1041 Route 202-206
Bridgewater, NJ 08807
(908) 231-4762
email: Stewart.Samuels at Sanofi-Aventis.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20050921/d46edaeb/attachment.html
More information about the torqueusers
mailing list