[torqueusers] Nodes to long listed as down

Garrick Staples garrick at clusterresources.com
Wed Nov 1 13:58:05 MST 2006


On Tue, Oct 31, 2006 at 12:41:54PM +0100, Julian Hagenauer alleged:
> Hi,
> i have a very strange setup :-)
> I have two identical servers both running a torque-server and a torque-scheduler, and only one node running the mom.
> There is only one server at a time accesible, but it gets swapped periodically by the other server.
> You can think of it like that:
> 
> Server1----|
> 	   |-----------Node
> 
> Server2----
> 
> The servers get switched dynamically while both are running.
> If Server1 is booted (and accessible) it takes about 15 seconds till the node gets marked as free.
> If i dynamically switch to Server2 after some time it takes about 3:15 minutes till the node gets marked as free.
> That is far to long for my case, i want the node to be recognized as free as soon as possible...
> I have looked through the configurations, but did not find anything suitable.
> I have set server node_ping_rate to 5 and tested several node_check_rates without any change in behaviour.
> On node-side i have set $status_update_time to 5 seconds, but it is still not recognized as free earlier.
> 
> What i am missing?

Arp cache on the node?

We don't really support such configurations right now, though some HA
plans are on the table.



More information about the torqueusers mailing list