threefoldtech/mycelium

Advertised update interval for selected route should not depend on route hold time

LeeSmet opened this issue · 0 comments

While this was recently added, on review it turns out this should not be done like this, and instead a value of the periodic update interval should be used. For instance, consider the following system with 3 nodes A B and C, where both A and C are connected to B but not to eachother, i.e. A <-> B <-> C. Node A starts, and eventually node B starts and connects to A. B will request a route dump, which will get A to send it's local route with an interval of currently 1 minute. B inserts the route, selects it, and calculates the hold time to be 1.5 minutes. Crucially, this happened just after A's periodic update timer fired (so it won't fire again for a minute). After 45 seconds, C starts and connects to B. It requests a route table dump, gets the static route of B with an interval of 1 minute, but also the route to A with the current remaining hold time of 35 seconds. C now saves the route to A via B, with a hold time of (35 * 1.5 = 52.5 seconds). If the periodic update timer of B does not fire again within this time (which it could as its on a 60 second interval), there is a time frame where the route to A via B will be expired and retracted.

The rfc is not very clear on this but it seems that advertised routes should actually not consider the local hold time and instead use the interval of the periodic update. This makes inherent sense since interval is an upper bound on when an announcement will be repeated, and we know that it will be repeated in the next periodic update, which is at most "periodic update interval" time away. Even if the route is retracted in the meantime, it will be repeated as a retraction update (or if it isn't it expires on the receiver, which is expected).
Next to this, we should also consider the protocol parameters and how they influence eachother.
The main issue with using calculated hold times is that we don't send triggered updates if a route simply refreshes. This is also not something we generally want, since it would increase protocol traffic for no real use case.