[torqueusers] pbs_server and nodes file how to handle comments

David Golden dgolden at cp.dias.ie
Wed Mar 29 05:24:06 MST 2006


On 2006-03-28 09:58:09 -0800, Garrick Staples wrote:
> On Tue, Mar 28, 2006 at 01:46:44PM +0100, David Golden alleged:
> > > That would be a frequency of 0.  New nodes start in state unknown, get
> > > pinged, and get an addr list.  The old nodes never get the new addr list.
> > 
> > Ah.
> > 
> > Not that it's necessarily what you'd want to do (especially given your
> > large-cluster avoiding-ping concerns and maybe iffy effect on running 
> > jobs, though jobs on nodes I tested on weren't interrupted): 
> > but if you "pbsnodes -r" on the old nodes to force them state=down, 
> > do they then get the updated node list and do something useful with
> > it when they're noticed to be "back" online by the server? 
> 

> Yes, setting a node to down will trigger a ping operation and it will
> get a new addr list.
>
> This is why a cluster-wide ping operation is needed to support creating
> new nodes automatically.
>

Well, point being that presumably one could therefore do a
"pbsnodes -r node1 node2 node3 node4 ... nodeN" after adding
the new nodes  -i.e. bring every node in the cluster to 
state = down so they all get the new list? (you could do subsets
at a time, too, for large clusters (especially if said cluster
is split into nodesets, nodes in one set mightn't need to know
about the nodes in another set immediately)) - maybe 
sledgehammer-for-a-nail, though then again maybe not: e.g. you 
mightn't want new parallel jobs issued to a  node until you're 
sure it had the new node list. 

There's also the not-being-able-to-do-everything-within-qmgr:
but you could make node states settable within qmgr,
then do much the same thing - i.e.

create new nodes 
set all nodes down
clear new nodes offline 




More information about the torqueusers mailing list