Comparing mT5 and M2M models

In this project, I fine tuned both of mT5 and M2M on a small dataset (10k) of Yoruba - English sentence pairs, and compared the results. Both models were trained on multi-lingual data, and both are being used for translation. They were released during relatively the same, mT5 is a little bit bigger than M2M.

I use both fairseq and simpletransformers for training and it took around two hours on a P100 GPU.