[torqueusers] Nodes to long listed as down

Julian Hagenauer chaosbringer at gmx.de
Thu Nov 2 00:22:08 MST 2006


On Wed, 1 Nov 2006 13:58:05 -0700
Garrick Staples <garrick at clusterresources.com> wrote:

> On Tue, Oct 31, 2006 at 12:41:54PM +0100, Julian Hagenauer alleged:
> > Hi,
> > i have a very strange setup :-)
> > I have two identical servers both running a torque-server and a
torque-scheduler, and only one node running the mom.
> > There is only one server at a time accesible, but it gets swapped
periodically by the other server.
> > You can think of it like that:
> > 
> > Server1----|
> > 	   |-----------Node
> > 
> > Server2----
> > 
> > The servers get switched dynamically while both are running.
> > If Server1 is booted (and accessible) it takes about 15 seconds till
the node gets marked as free.
> > If i dynamically switch to Server2 after some time it takes about
3:15 minutes till the node gets marked as free.
> > That is far to long for my case, i want the node to be recognized as
free as soon as possible...
> > I have looked through the configurations, but did not find anything
suitable.
> > I have set server node_ping_rate to 5 and tested several
node_check_rates without any change in behaviour.
> > On node-side i have set $status_update_time to 5 seconds, but it is
still not recognized as free earlier.
> > 
> > What i am missing?
> 
> Arp cache on the node?
> 
> We don't really support such configurations right now, though some HA
> plans are on the table.

Hi,
yes, Server1, Server2 and the node are virtual machines, and the virtual
machine monitor has an arp cache enabled, so that packets get routed
correctly.
What are HA plans? Is there a way around that, e.g. manipulating the
arp-table or something?

Thank you,
Julian


More information about the torqueusers mailing list