[torqueusers] Re: Nodes to long listed as down

Julian Hagenauer chaosbringer at gmx.de
Wed Nov 1 08:04:46 MST 2006


On Tue, 31 Oct 2006 12:41:54 +0100
Julian Hagenauer <chaosbringer at gmx.de> wrote:

> Hi,
> i have a very strange setup :-)
> I have two identical servers both running a torque-server and a torque-scheduler, and only one node running the mom.
> There is only one server at a time accesible, but it gets swapped periodically by the other server.
> You can think of it like that:
> 
> Server1----|
> 	   |-----------Node
> 
> Server2----
> 
> The servers get switched dynamically while both are running.
> If Server1 is booted (and accessible) it takes about 15 seconds till the node gets marked as free.
> If i dynamically switch to Server2 after some time it takes about 3:15 minutes till the node gets marked as free.
> That is far to long for my case, i want the node to be recognized as free as soon as possible...
> I have looked through the configurations, but did not find anything suitable.
> I have set server node_ping_rate to 5 and tested several node_check_rates without any change in behaviour.
> On node-side i have set $status_update_time to 5 seconds, but it is still not recognized as free earlier.
> 
> What i am missing?

Hi,
i found the following out.
tcpdump on node's interface results:
15:26:09.144922 IP worker1.chaosbringer.de.1023 > head.chaosbringer.de.15001: UDP, length 327
15:26:09.145228 IP worker1.chaosbringer.de.1023 > head.chaosbringer.de.15001: UDP, length 327
15:26:09.145235 IP worker1.chaosbringer.de.1023 > head.chaosbringer.de.15001: UDP, length 327
.
.
.
.
15:27:37.144870 IP worker1.chaosbringer.de.1023 > head.chaosbringer.de.15001: UDP, length 327
15:27:37.144871 IP worker1.chaosbringer.de.1023 > head.chaosbringer.de.15001: UDP, length 327
15:27:37.144871 IP worker1.chaosbringer.de.1023 > head.chaosbringer.de.15001: UDP, length 327
15:27:37.144871 IP worker1.chaosbringer.de.1023 > head.chaosbringer.de.15001: UDP, length 327
15:27:39.144973 IP worker1.chaosbringer.de.1023 > head.chaosbringer.de.15001: UDP, length 26
15:27:39.225703 arp who-has worker1.chaosbringer.de tell 192.168.1.15
15:27:39.225752 arp reply worker1.chaosbringer.de is-at 00:16:3e:46:2c:1a (oui Unknown)
15:27:39.225755 IP head.chaosbringer.de.15001 > worker1.chaosbringer.de.1023: UDP, length 26

As you see at 15:27:39:144973 worker1 (the worker-node running torque-mom) send a packet with just 26 bytes to the head ( the host running torque server and scheduler ).
Head replys with also a packet of size 26 Bytes and from this moment the node is listed as free at the host.
So, what i want is, that this 26 Byte worker-to-head Packet is send more frequently, e.g. every 5 seconds.
How can i achieve that? 

Thank you,
Julian


More information about the torqueusers mailing list