Numeral-Web-Scraper

Web Scraper for extracting numerals of all languages from languagesandnumbers.com for later analysis. Saves them in a readible .csv format.

Requirements

See requirements.txt
Python 3.9+

Function

Scrapes all numerals listed at languagesandnumbers.com from all 251 languages. Furthermore, the scraped numerals get saved in a CSV-File in the desired script-path which can be viewed in any editor for later analysis. A progress bar indicates how many websites are left.

Execution

Binary

[Note: Releases are Outdated, I will update them soon when I finished most of the aspects listed in the TODOs. For now, please build the project manually.]

Download the .exe-file from the releases tab. Double-click to execute.

Use/build from Source

Download and unzip source code or clone the repository with git clone https://github.com/mrtnbm/Web-Scraper-Public-.git
Install Python 3.9+ sudo apt install python3.9
Optionally update pip, setuptools, wheel: python3 -m pip install --upgrade pip setuptools wheel
Install requirements pip install -r requirements.txt
Start script with python3 web-scraper-all.py resp. python web-scraper-all.py on Windows.

Build binary yourself

Execute pyinstaller -wF web-scraper-all.py.

Run tests

python test-web-scraper-all.py

GUI

Main Window for changing settings and selecting a folder to save the csv file
Secondary Window for viewing the progression of the script

TODO

Test-Cases for all functions (achieve coverage >= 75%)
refactor main (more seperate functions, less code in main)
refactor to meet OOP standards
fix all code smells
redirect uploading artifacts to deploy outside of repository

mrtnbm/Web-Scraper-Public-