How to prepare data

Question

How to prepare data

akanshajainn opened this issue 6 years ago · 1 comments

In this repo you have provided the data in zipped which will be used to train the MT system. But I am planning to try it on different set of languages, but I am really stuck on how to prepare data for that. I do know how to tokenise, binarize the data, but don't know how to get those dictionary, and first translation data?

Answer 1 · 2018-10-04T07:58:31.000Z

As described in the README, the first translation data were generated by this project: https://github.com/jsenellart/papers/tree/master/WordTranslationWithoutParallelData

You should also be able to use the official project: https://github.com/facebookresearch/MUSE