Tidies up the JSON file initially provided by Kaikki.
This takes the data provided by kaikki.org, and tidies it up a bit so it's easier to use with other projects.
It's specifically been created for Russian, but you can modify the code to handle other languages.
These scripts require Python 3.9.6 or newer and Node.js 14.16.1 or newer.
- Download this massive JSON file (~13GB) from Kaikki.
Or download the compressed version (~1.5GB) and extract it.
Either way, you'll now have a file calledraw-wiktextract-data.json
- Download the repository, clone it, whatever.
-
Move
raw-wiktextract-data.json
to theStep 1
directory. -
Run
extract-language.py
. -
Move
ru-wikiextract.json
to theStep 2
directory. -
Run
extract-lemmas.py
. -
Move
ru-lemmas.json
to theStep 3
directory. -
Run
tidy-up.js
.
Inside the Step 3
directory, you should now have a file called ru-en-wiktionary-dict.json
.
This file should contain everything in a neat layout.