- ✅ Frequency List
- ✅ Pair Sentences
- ❎ Tokenization, Stemming and Lemmatization
- ❎ Scoring Pair Sentences according to difficulty
- ❎ Assigning Frequency List to Pair Sentences
- ❎ Including Audio from Tatoeba
- ❎ Anki Deck automatic generation
- Try to prioritize Pair Sentences with already seen words from the Frequency List (how to engage this problem?)
- Better packaging of solution for easier usage
- Test coverage to facilitate understanding and colaboration
- Parametrization to facilitate usage of alternative resources/corpus
- Performance tuning
Thanks to hermitdave
/FrequencyWords for the frequency lists used in this project.
Thanks to kmicklas
/sentence-pairs for the logic to extract pairs from Tatoeba files.
Thanks to (https://en.wiki.tatoeba.org/articles/show/make-anki) for a clear example on how to export translated sentences from Tatoeba.
How to get audios:
Judging by this discussion on GitHub, you should be able to access audio files using just their language code and sentence ID. The URL scheme seems to be http://audio.tatoeba.org/sentences/<<language code>>/<<sentence id>>.mp3
Alternatives to sentence pairing, word translation and audio:
Google Translator (unofficial) API: https://github.com/ssut/py-googletrans Google Translate (unofficial) TTS: https://github.com/hungtruong/Google-Translate-TTS