decrypto-org/spider

Large numbers of paths on the same host leads to OOM condition

Closed this issue · 0 comments

We found a website that lists every single bitcoin transaction, block, and additional information for those, that ever occurred. This leads to a case, where our randomizing of the entries is not working anymore as a load distributor. The queuedRequests buffer then fills up rapidly with entries that are taken from the pool and added to the queuedRequests buffer, in order to keep them for later requests. Since in this particular case, we encountered that about 3/4 of the stored paths in the database were from this host, after a few iterations, almost the complete database was held in memory.
Suggestion: We change the way we handle the case that all concurrent requests to a host are already in use: We change the request to the database such that we exclude all the baseurls that have a max number of concurrent requests running.