Add Samanantar datasets.

Samanantar is the largest publicly available parallel corpora collection for Indic languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu. The corpus has 49.6M sentence pairs between English to Indian Languages.

https://ai4bharat.iitm.ac.in/samanantar

Related #119
We already had it, (#34) but they changed the links.

@BrightXiaoHan Thanks for creating this issue. If this is urgent, could you please update this link with v0.3 (or newest) from https://ai4bharat.iitm.ac.in/samanantar

mtdata/mtdata/index/ai4bharat.py

Line 17 in c57dab5

    
           BASE_v0_2 = 'https://storage.googleapis.com/samanantar-public/V0.2/data/{dirname}/{pair}.zip'

and test if works! Thanks

Thanks for your reply. I will try to test it.