Sumerian - English Machine Translation (GSoC - 2020)

As part of the MTAAC project, the organization host Sumerian data comprising 1.5 million transliteration lines and 10K parallel lines corpus (approx). We already developed a neural network-based encode-decoder architecture for English-Sumerian Machine Translation, but that leverages the parallel dataset only, which is not sufficient to achieve state of the art results. Your task is to develop a language model using the monolingual data as well as parallel data to translate Sumerian phrases to English, and vice versa.

Possible Mentors:

Niko Schenk
Ravneet Punia

Link for the Dataset: TODO

Your Tasks & Desired (Minimum) Outcomes:

Train and evaluate different models and architectures on standard train/development/test splits. Experiment with all possible hyperparameter settings to obtain the best performance. Perform a quantitative and qualitative evaluation of the translations. Better accuracy than the previous year model. Testing different at least two NMT approaches like Cross-lingual Language Model, Dual Learning or Back-Translation. Students with a research background will be preferred.

Getting started:

Cross-lingual Language Model
Dual Learning
Back-Translation for Unsupervised NMT

Shashankjain12/Unsupervised-NMT-for-Sumerian-English

Sumerian - English Machine Translation (GSoC - 2020)

Possible Mentors:

Link for the Dataset: TODO

Your Tasks & Desired (Minimum) Outcomes:

Getting started: