[torqueusers] downing a node via qmgr

Brett Ellis ellis at cs.utk.edu
Wed Sep 21 10:27:18 MDT 2005


Stewart,
   I have typically used

pbsnodes -o NODENAME

which may be the same as

qmgr -c 's n node-name state=offline'

to handle problematic nodes, with no issues of
them reviving...
  Brett

Stewart.Samuels at sanofi-aventis.com wrote:
> I have just experienced strange behaviour with qmgr.  We currently have 
> a node which is rebooting itself constantly.  To take the system out of 
> the cluster to diagnose the problem, I have specify the following command:
> 
>         qmgr -c 's n node-name state=down'
> 
> For a few moments, once the qmgr command is issued, subsequent "pbsnode 
> -a" commands show node-name "down".  But for some reason, it then shows 
> the node as "free" again.
> 
> Has anyone seen this behaviour?  I realize we are running a little 
> behind with the patch level, but we are running torque-1.2.0p1 and 
> maui-3.2.6p11.
> 
> When there is such a failure (this has occurred a few times in our 
> cluster), is there a way (other than qmgr) of temporarily removing nodes 
> in which deleting the node in the server_prive/nodes file and restarting 
> pbs_server using the "-t create" argument is not necessary?
> 
> *//              _ __Stewart Samuels_*
> *               Infrastructure Evolution and Integration*
> *               Scientific and Medical Affairs *
> *               Sanofi-Aventis Pharmaceutical          *   ***** *
> *               1041 Route 202-206                       *
> *              Bridgewater, NJ  08807*
> 
> *              (908) 231-4762*
> *              email:  Stewart.Samuels at Sanofi-Aventis.com*
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list