shilad/wikibrain

The downloader should use a multithread implement for better performance

monkey2000 opened this issue · 3 comments

The downloader should use a multithread implement for better performance

Thanks for the suggestion! It's easy to do this, but last time I checked Wikipedia servers blocked you if you tried to open multiple download streams simultaneously. Do you have reason to believe this isn't true?

The official repo said that max per-ip connections is 2. Also, there are many mirrors. I think fetch from multi-mirrors can solve this problem.
See also: http://dumps.wikimedia.org/

The mirrors list can be found here: http://dumps.wikimedia.org/mirrors.html