nlpcuom/English-Tamil-Parallel-Corpus

What is the source of the dataset?

Vedant2311 opened this issue · 0 comments

Hello there.

I am currently working on a machine translation project at IIT Delhi and might want to make use of this dataset. But before doing so, it will be much favorable to us to know more details about your dataset.

I came across this git repo from the WAT-2020 Indic MT task description (http://lotus.kuee.kyoto-u.ac.jp/WAT/indic-multilingual/). The hyperlink named "NLPC" there refers to your github page. The README here lacks a lot regarding the ways in which this parallel corpora was created, is this dataset associated with any published research paper etc. I even explored the web page of NLPC at University of Moratuwa but only links to these datasets are provided there without any citations.

Can you please help me with this?