/wordlist-generator

Generate words list XML files suitable for IME dictionaries.

Primary LanguagePython

Wordlist Generator

Generate words list XML files suitable for IME dictionaries.

The goal is to generate dictionary files used in Firefox OS keyboard for rare languages using Wikipedia as a corpus.

The XML format is taken from the one used in Android source code.

Install

Clone or fork this repo, then do:

$ npm install

Usage

$ node bin/generate xx

Where xx is the language code of the target language. See languages_code.json for a list of all available languages.

You can change the temporary directory in config/settings.json.

Todo

  • Pluggable corpus other than Wikipedia
  • Use natural for all NLP needs
  • Unit tests

Note

This project was only tested on Welsh (cy) and Latin (la) on a Linux machine.