EpicWink/proxpi

Increase download speed with multi-connection request

MrT3acher opened this issue · 3 comments

It's so faster if change _download_file method to support multi-connection download.

def _download_file(self, url: str, path: str):

Problem definition

Download is slow for my network situation. (I believe others can experience it, too.)

Current behaviour and workarounds

Proposed solution (optional)

make _download_file method multi-connection.

this is an example:
http://stackoverflow.com/questions/13973188/ddg#13973531

Right now, the downloading of a file will first try to download from the source index, waiting some time (default up to 0.9 seconds) before returning the downloaded file to the client. If the file is not downloaded within that time, the client is redirected to download directly from the source index.

pip will then download the file, and move on to getting the next dependency (if any), which will start the process again. This means that on slow connections, there will already be multiple files being downloaded concurrently as both pip and proxpi the same file, and any future large files which are also too slow to be downloaded.

In summary, I don't think a parallelised single-file download will achieve much speedup. In addition, this increased load on PyPI is one of the main things I'm trying to avoid with proxpi

The main problem is that in my network I can't access to pypi.org and its indexes directly. so I use proxpi as a proxy-cache server. so I changed PROXPI_DOWNLOAD_TIMEOUT to a very large number to avoid redirecting user to main pypi.org index url. but at this point proxpi downloads the python packages slowly. I tested downloading packages with a Download Manager and it's very fast (concurrent connection speeds up the download).

this feature can be controlled using some environment variable like PROXPI_DOWNLOAD_CONNECTIONS with default value equal to 1 that means no concurrency.

If you are OK, I can develop this part and make a merge request.

Feel free to implement this. If the default is to not performed a ranged request, I'm fine with that