How to prepare data
akanshajainn opened this issue · 1 comments
akanshajainn commented
In this repo you have provided the data in zipped which will be used to train the MT system. But I am planning to try it on different set of languages, but I am really stuck on how to prepare data for that. I do know how to tokenise, binarize the data, but don't know how to get those dictionary, and first translation data?
guillaumekln commented
As described in the README, the first translation data were generated by this project: https://github.com/jsenellart/papers/tree/master/WordTranslationWithoutParallelData
You should also be able to use the official project: https://github.com/facebookresearch/MUSE