[torqueusers] Nodes to long listed as down
garrick at clusterresources.com
Wed Nov 1 13:58:05 MST 2006
On Tue, Oct 31, 2006 at 12:41:54PM +0100, Julian Hagenauer alleged:
> i have a very strange setup :-)
> I have two identical servers both running a torque-server and a torque-scheduler, and only one node running the mom.
> There is only one server at a time accesible, but it gets swapped periodically by the other server.
> You can think of it like that:
> The servers get switched dynamically while both are running.
> If Server1 is booted (and accessible) it takes about 15 seconds till the node gets marked as free.
> If i dynamically switch to Server2 after some time it takes about 3:15 minutes till the node gets marked as free.
> That is far to long for my case, i want the node to be recognized as free as soon as possible...
> I have looked through the configurations, but did not find anything suitable.
> I have set server node_ping_rate to 5 and tested several node_check_rates without any change in behaviour.
> On node-side i have set $status_update_time to 5 seconds, but it is still not recognized as free earlier.
> What i am missing?
Arp cache on the node?
We don't really support such configurations right now, though some HA
plans are on the table.
More information about the torqueusers