- python 3.10 or higher
- poetry package manager
- ICU library, e.g. package
libicu-dev
in Ubuntu
Experimental parser to load words from czech wiktionary
Obtain latest cs
dump from https://meta.wikimedia.org/wiki/Data_dumps / https://dumps.wikimedia.org/backup-index.html
poetry run python3 01_create-wordlist-from-wiktionary.py
# creates words, words_uniq, words_to_lemmas