suyashb95/WiktionaryParser

Optimalization when scrapping the same page for multiple languages

C0rn3j opened this issue · 0 comments

At the moment I have this simple scrapper - https://haste.c0rn3j.com/ahiyofahuf.py

It takes a word and scraps it in two languages. This however seems to send two requests to Wiktionary instead of just one (it is after all requesting the same page).

Is there a way I can scrap both languages in one request as to make the process faster and load on Wiktionary smaller?

EDIT: Assuming this is not currently implemented.

The parser could save the whole pages to /tmp/WiktionaryParser/. /tmp/ on every decent distro gets cleaned after reboot, and it should be a tmpfs on most distros (RAM storage).

So the parser just goes to check /tmp if the file is already there and not older than let's say 24 hours(user configurable?), and acts accordingly.

I think this should be user configurable behavior in case scrapping XXk pages can take a lot of memory.

If implemented, it should be mentioned on the README.