[torqueusers] Nodes to long listed as down
chaosbringer at gmx.de
Thu Nov 2 00:30:31 MST 2006
On Thu, 2 Nov 2006 08:22:08 +0100
Julian Hagenauer <chaosbringer at gmx.de> wrote:
> On Wed, 1 Nov 2006 13:58:05 -0700
> Garrick Staples <garrick at clusterresources.com> wrote:
> > On Tue, Oct 31, 2006 at 12:41:54PM +0100, Julian Hagenauer alleged:
> > > Hi,
> > > i have a very strange setup :-)
> > > I have two identical servers both running a torque-server and a
> torque-scheduler, and only one node running the mom.
> > > There is only one server at a time accesible, but it gets swapped
> periodically by the other server.
> > > You can think of it like that:
> > >
> > > Server1----|
> > > |-----------Node
> > >
> > > Server2----
> > >
> > > The servers get switched dynamically while both are running.
> > > If Server1 is booted (and accessible) it takes about 15 seconds till
> the node gets marked as free.
> > > If i dynamically switch to Server2 after some time it takes about
> 3:15 minutes till the node gets marked as free.
> > > That is far to long for my case, i want the node to be recognized as
> free as soon as possible...
> > > I have looked through the configurations, but did not find anything
> > > I have set server node_ping_rate to 5 and tested several
> node_check_rates without any change in behaviour.
> > > On node-side i have set $status_update_time to 5 seconds, but it is
> still not recognized as free earlier.
> > >
> > > What i am missing?
> > Arp cache on the node?
> > We don't really support such configurations right now, though some HA
> > plans are on the table.
> yes, Server1, Server2 and the node are virtual machines, and the virtual
> machine monitor has an arp cache enabled, so that packets get routed
> What are HA plans? Is there a way around that, e.g. manipulating the
> arp-table or something?
> Thank you,
Sorry, i meant arp-proxy, not arp-cache... but maybe these terms mean anyway the same....
More information about the torqueusers