Extend DB Results Cache to Pool

Question

Extend DB Results Cache to Pool

Closed this issue 6 years ago · 1 comments

In order to give the network always a collection of different entries from different baseUrls, we extend the Cache to always at least hold a certain amount of entries, which then can be passed to the network. Those entries should be "random" enough such that we do not bombard a server with 100 simultaneous requests. This could lead to the detection and ban of our scraper. To prevent that, we further can insert a rate limit within the network class to prevent more than 4 (Firefox Standard) simultaneous connections to the server. If already 4 connections are open, the 5th received from the Pool will be stalled (Introducing a Backlog queue in the network module) and executed once one of the four pending requests returned. So the logic would work like this: Once a request returns, we first check if there exists a stalled request for this domain. If so, use this. Otherwise, get a new entry from the pool. The pool itself is responsible that it has always enough entries.

Answer 1 · 2018-04-18T20:07:58.000Z

Pool lives now within the network module, since the network needs to handle the back off cache anyway - that way, we have the logic for the pool at the same location, which also makes sense in the perspective of the "interface" thinking: Otherwise the two modules would be tightly coupled to ensure good working caching/pooling.
Closing this issue for now.