etix/mirrorbits

Feature: Redistribute downloads in a given timeframe

dagwieers opened this issue · 2 comments

At Kodi currently we have to limit the update frequency for add-ons to 24 hours because the hardware is unable to handle more requests if we increase that frequence (see discussion at: xbmc/xbmc#17955 (comment)).

One of the ideas to get a better use of the resources (by reducing the peak usage) is to balance the load by better distributing the requests. This would allow mirrorbits to offer a new update interval in order to distribute requests better for a given window. Ideally this would lead to a more frequent update check based on available resources.

One idea was to include a Retry-After mechanism that would tell Kodi when to perform the next update check (e.g. Next-Request-After. This could be part of the HTTP 302 request. If the system is unable to handle any more requests (because of peak usage/resource starvation) it could also offer a Retry-After header to postpone requests (HTTP 429) to a less busy time frame.

Eventually this would lead to a system that would be able to predict an optimal frequency based on existing resources and an expected number of requests within that interval, but it could also account for a margin for unexpected increase of requests.

I don't know if such a feature was ever discussed or if existing use-cases could benefit from such a mechanism. Obviously with Kodi we can influence the behaviour of the client making those requests, which is not always feasible for other mirrorbits use-cases. But such a mechanism would allow different repositories to offer a controlled and self-balancing system, based on their own usage. Whereas currently all Kodi repositories have this hard-coded 24h update frequency that is harming the reliability for end-users (because emergency fixes cannot be pushed out before primetime).

So I can see this work in various ways:

Fixed frequency (server-side control)

Every successful request would be answered with a Next-Request-After header with X hours in the future. This would allows site administrators to make the client software (i.e. Kodi) to adapt to their preferred frequency. Which means that depending on their infrastructure load they could increase or decrease the frequency.

In this case the load will not be distributed evenly.

Optionally, this frequency could be different for the type of content, or path (regexp) used. So that e.g. certain metadata is more frequently checked than other. (i.e. in such a case one mirrorbits setup could offer 2 repositories with a different use-case, e.g. one for emergency updates which only infrequently changes and is small, and one for regular updates).

Variable frequency (evenly redistribute)

Based on a distribution model, and specific statistics of the previous 24h, or even the past week, a successful request would be using Next-Request-After to redistribute the next request to low-traffic periods. In this case mirrorbits would need to track both the past and future grants to ensure requests are distributed evenly in the future.

This could either be based on a fixed target frequency, or based on the existing load (with margin). In the ideal situation this would work automagically out of the box.

Backoff on peak load

In some cases we cannot avoid a specific load to exceed a certain threshold (e.g. one of the two nodes of a cluster fails or is put in maintenance). In this case we probably want to fail requests more aggressively by using HTTP 429 Too Many Requests with a Retry-After header using a safe margin.

In case of issues

If for whatever reason the client (i.e. Kodi) would not get a Retry-After or Next-Request-After header it would fall back to a safe value hard-coded in the client. Currently in Kodi this is 24h. This would ensure that the client does not inadvertently stop checking for updates.

I opted to use a different header (Next-Request-After) because the existing header used for this has a very specific meaning for redirects (i.e. it tells the client to wait performing the redirect, which is not what the intention is in our case).

Other services do use Retry-After for this specific use-case with HTTP 200. While others use a custom header (e.g. osvc-crest-next-request-after) for this purpose.