[torqueusers] pbs_server and nodes file how to handle comments
dgolden at cp.dias.ie
Wed Mar 29 05:24:06 MST 2006
On 2006-03-28 09:58:09 -0800, Garrick Staples wrote:
> On Tue, Mar 28, 2006 at 01:46:44PM +0100, David Golden alleged:
> > > That would be a frequency of 0. New nodes start in state unknown, get
> > > pinged, and get an addr list. The old nodes never get the new addr list.
> > Ah.
> > Not that it's necessarily what you'd want to do (especially given your
> > large-cluster avoiding-ping concerns and maybe iffy effect on running
> > jobs, though jobs on nodes I tested on weren't interrupted):
> > but if you "pbsnodes -r" on the old nodes to force them state=down,
> > do they then get the updated node list and do something useful with
> > it when they're noticed to be "back" online by the server?
> Yes, setting a node to down will trigger a ping operation and it will
> get a new addr list.
> This is why a cluster-wide ping operation is needed to support creating
> new nodes automatically.
Well, point being that presumably one could therefore do a
"pbsnodes -r node1 node2 node3 node4 ... nodeN" after adding
the new nodes -i.e. bring every node in the cluster to
state = down so they all get the new list? (you could do subsets
at a time, too, for large clusters (especially if said cluster
is split into nodesets, nodes in one set mightn't need to know
about the nodes in another set immediately)) - maybe
sledgehammer-for-a-nail, though then again maybe not: e.g. you
mightn't want new parallel jobs issued to a node until you're
sure it had the new node list.
There's also the not-being-able-to-do-everything-within-qmgr:
but you could make node states settable within qmgr,
then do much the same thing - i.e.
create new nodes
set all nodes down
clear new nodes offline
More information about the torqueusers