[torqueusers] Nodes to long listed as down

Julian Hagenauer chaosbringer at gmx.de
Tue Oct 31 04:41:54 MST 2006

i have a very strange setup :-)
I have two identical servers both running a torque-server and a torque-scheduler, and only one node running the mom.
There is only one server at a time accesible, but it gets swapped periodically by the other server.
You can think of it like that:



The servers get switched dynamically while both are running.
If Server1 is booted (and accessible) it takes about 15 seconds till the node gets marked as free.
If i dynamically switch to Server2 after some time it takes about 3:15 minutes till the node gets marked as free.
That is far to long for my case, i want the node to be recognized as free as soon as possible...
I have looked through the configurations, but did not find anything suitable.
I have set server node_ping_rate to 5 and tested several node_check_rates without any change in behaviour.
On node-side i have set $status_update_time to 5 seconds, but it is still not recognized as free earlier.

What i am missing?

Thank you,

