[torqueusers] downing a node via qmgr

Stewart.Samuels at sanofi-aventis.com Stewart.Samuels at sanofi-aventis.com
Wed Sep 21 10:22:03 MDT 2005


I have just experienced strange behaviour with qmgr.  We currently have a node which is rebooting itself constantly.  To take the system out of the cluster to diagnose the problem, I have specify the following command:

	qmgr -c 's n node-name state=down'

For a few moments, once the qmgr command is issued, subsequent "pbsnode -a" commands show node-name "down".  But for some reason, it then shows the node as "free" again.

Has anyone seen this behaviour?  I realize we are running a little behind with the patch level, but we are running torque-1.2.0p1 and maui-3.2.6p11.

When there is such a failure (this has occurred a few times in our cluster), is there a way (other than qmgr) of temporarily removing nodes in which deleting the node in the server_prive/nodes file and restarting pbs_server using the "-t create" argument is not necessary?

               Stewart Samuels
               Infrastructure Evolution and Integration
               Scientific and Medical Affairs 
               Sanofi-Aventis Pharmaceutical              
               1041 Route 202-206			
              Bridgewater, NJ  08807

              (908) 231-4762
              email:  Stewart.Samuels at Sanofi-Aventis.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20050921/d46edaeb/attachment.html


More information about the torqueusers mailing list