OpenNMT/Hackathon

How to prepare data

akanshajainn opened this issue · 1 comments

In this repo you have provided the data in zipped which will be used to train the MT system. But I am planning to try it on different set of languages, but I am really stuck on how to prepare data for that. I do know how to tokenise, binarize the data, but don't know how to get those dictionary, and first translation data?

As described in the README, the first translation data were generated by this project: https://github.com/jsenellart/papers/tree/master/WordTranslationWithoutParallelData

You should also be able to use the official project: https://github.com/facebookresearch/MUSE