/tidy-kaikki-russian

Tidies up the JSON file initially provided by Kaikki.

Primary LanguagePython

tidy-kaikki-russian

Tidies up the JSON file initially provided by Kaikki.

This takes the data provided by kaikki.org, and tidies it up a bit so it's easier to use with other projects.

It's specifically been created for Russian, but you can modify the code to handle other languages.

These scripts require Python 3.9.6 or newer and Node.js 14.16.1 or newer.

Instructions

  1. Download this massive JSON file (~13GB) from Kaikki.
    Or download the compressed version (~1.5GB) and extract it.
    Either way, you'll now have a file called raw-wiktextract-data.json
  2. Download the repository, clone it, whatever.
  3. Move raw-wiktextract-data.json to the Step 1 directory.
  4. Run extract-language.py.
  5. Move ru-wikiextract.json to the Step 2 directory.
  6. Run extract-lemmas.py.
  7. Move ru-lemmas.json to the Step 3 directory.
  8. Run tidy-up.js.

Inside the Step 3 directory, you should now have a file called ru-en-wiktionary-dict.json.

This file should contain everything in a neat layout.