[torquedev] Inconsistent behavior in updating trusted client list
yhemanth at yahoo-inc.com
Thu Oct 11 11:03:59 MDT 2007
The torque wiki mentions that dynamic addition of nodes must be
accompanied with a restart of the pbs_server.
One of the reasons for this could be that the trusted client list on the
MOMs is not updated in a consistent manner after node addition.
The test I performed was as follows:
My original setup had one host address in the pbs_server nodes file. The
MOM on this node had 3 addresses in it's trusted client list: localhost,
its own address, and the pbs_server's address.
Then, using qmgr, I added one more node. This resulted in an update to
the MOM's trusted client list. After awhile, I added one more node.
However, this did not send any update.
Tracking this down in the code, I narrowed the problem to the function
send_cluster_addrs in server/node_manager.c. In this, it appears that
after an update is sent to all the nodes when a node is added, the
variable startcount should be reset to 0. If this is not done, the
variable, being static, retains it's value even when processing a new
node and essentially steps out of the pbsndmast array without going over
the entire list again.
The following patch (on TRUNK) addresses this issue:
--- src/server/node_manager.c (revision 1572)
+++ src/server/node_manager.c (working copy)
@@ -1003,6 +1003,8 @@
+ /* reset startcount, as we've sent the updates for all servers */
+ startcount = 0;
} /* END send_cluster_addrs */
I am not fully familiar with the code. Can someone please verify if my
analysis and fix is right ?
More information about the torquedev