[torqueusers] Inconsistent behavior in updating trusted client list of MOMs

Hemanth Yamijala yhemanth at yahoo-inc.com
Tue Oct 16 11:24:37 MDT 2007


The torque wiki mentions that dynamic addition of nodes must be
accompanied with a restart of the pbs_server.
(Ref: http://www.clusterresources.com/wiki/doku.php?

One of the reasons for this could be that the trusted client list on the
MOMs is not updated in a consistent manner after node addition.

The test I performed was as follows:

My original setup had one host address in the pbs_server nodes file. The
MOM on this node had 3 addresses in it's trusted client list: localhost,
its own address, and the pbs_server's address.
Then, using qmgr, I added one more node. This resulted in an update to
the MOM's trusted client list. After awhile, I added one more node.
However, this did not send any update.

Tracking this down in the code, I narrowed the problem to the function
send_cluster_addrs in server/node_manager.c. In this, it appears that
after an update is sent to all the nodes when a node is added, the
variable startcount should be reset to 0. If this is not done, the
variable, being static, retains it's value even when processing a new
node and essentially steps out of the pbsndmast array without going over
the entire list again.

The following patch (on TRUNK) addresses this issue:

Index: src/server/node_manager.c

--- src/server/node_manager.c   (revision 1572)
+++ src/server/node_manager.c   (working copy)
@@ -1003,6 +1003,8 @@

+    /* reset startcount, as we've sent the updates for all servers */
+    startcount = 0;
   }     /* END send_cluster_addrs */


I am not fully familiar with the code. Can someone please verify if my
analysis and fix is right ?


More information about the torqueusers mailing list