Web Scraper for extracting numerals of all languages from languagesandnumbers.com for later analysis. Saves them in a readible .csv format.
- See
requirements.txt
- Python 3.9+
Scrapes all numerals listed at languagesandnumbers.com from all 251 languages. Furthermore, the scraped numerals get saved in a CSV-File in the desired script-path which can be viewed in any editor for later analysis. A progress bar indicates how many websites are left.
[Note: Releases are Outdated, I will update them soon when I finished most of the aspects listed in the TODOs. For now, please build the project manually.]
- Download the .exe-file from the releases tab. Double-click to execute.
- Download and unzip source code or clone the repository with
git clone https://github.com/mrtnbm/Web-Scraper-Public-.git
- Install Python 3.9+
sudo apt install python3.9
- Optionally update pip, setuptools, wheel:
python3 -m pip install --upgrade pip setuptools wheel
- Install requirements
pip install -r requirements.txt
- Start script with
python3 web-scraper-all.py
resp.python web-scraper-all.py
on Windows.
- Execute
pyinstaller -wF web-scraper-all.py
.
python test-web-scraper-all.py
-
Main Window for changing settings and selecting a folder to save the csv file
-
Secondary Window for viewing the progression of the script
- Test-Cases for all functions (achieve coverage >= 75%)
- refactor main (more seperate functions, less code in main)
- refactor to meet OOP standards
- fix all code smells
- redirect uploading artifacts to deploy outside of repository