/AnkiLanguageDecks

Program to create Anki language decks using word frequency, translated phrases and text-to-voice.

Primary LanguagePythonMIT LicenseMIT

Plans for the first release

  • ✅ Frequency List
  • ✅ Pair Sentences
  • ❎ Tokenization, Stemming and Lemmatization
  • ❎ Scoring Pair Sentences according to difficulty
  • ❎ Assigning Frequency List to Pair Sentences
  • ❎ Including Audio from Tatoeba
  • ❎ Anki Deck automatic generation

After the first release

  • Try to prioritize Pair Sentences with already seen words from the Frequency List (how to engage this problem?)
  • Better packaging of solution for easier usage
  • Test coverage to facilitate understanding and colaboration
  • Parametrization to facilitate usage of alternative resources/corpus
  • Performance tuning

Thanks to hermitdave/FrequencyWords for the frequency lists used in this project.

Thanks to kmicklas/sentence-pairs for the logic to extract pairs from Tatoeba files.

Thanks to (https://en.wiki.tatoeba.org/articles/show/make-anki) for a clear example on how to export translated sentences from Tatoeba.

How to get audios: Judging by this discussion on GitHub, you should be able to access audio files using just their language code and sentence ID. The URL scheme seems to be http://audio.tatoeba.org/sentences/<<language code>>/<<sentence id>>.mp3

Alternatives to sentence pairing, word translation and audio:

Google Translator (unofficial) API: https://github.com/ssut/py-googletrans Google Translate (unofficial) TTS: https://github.com/hungtruong/Google-Translate-TTS