/NMT_challenge

This repository is related to the NMT challenge conducted in the CS779A course

Primary LanguageJupyter NotebookMIT LicenseMIT

Neural Machine Translation Challenge (CS779A)

Machine Translation from English to an Indian Language and vice versa.

Overview

It's no longer necessary to use a bilingual dictionary when translating from one language to another. Whenever you encounter words in a foreign language nowadays, you can instantly acquire the translation by logging on to a portal that offers online translation services. An automatic method of translating one language to another, this is essentially what machine translation is. It is now so widespread to utilise machine translation that Google Translate claims to translate more than 100 billion words every day.

In this challenge, the participant's goal is to translate the source language (English) to the target languages (Bengali, Gujarati, Hindi, Kannada, Malyalam, Tamil and Telugu) and vice versa in different phases.

Phases

There are two phases -

  • Translation from English to an Indian language : In this Challenge, you have to train a model from scratch (no pretrained models allowed) to translate English sentences to their Indian language counterpart. The duration of this phase is from September 4 00:00:00 IST to September 24, 2023 23:59:59 IST for training/validation and September 25 00:00:00 IST to September 26,2023 23:59:59 IST for testing. You will train your model on the file train_data1.json. Your submission will be evaluated based on test_data1.json that will be provided later. You are allowed to use transformers-based models but you need to train these from scratch. As written above no pre-trained model allowed.
  • Translation from an Indian Language to English : This phase will start from September 27 00:00:00 IST to October 20,2023 23:59:59 IST for training/validation and October 21 00:00:00 IST to October 22 23:59:59 IST for testing. You will train your model on the file train_data2.json (to be provided later). Your submission will be evaluated based on test_data2.json that will be provided later.

You are allowed to use transformers in both the phases (no pretrained models allowed).

Submission

Your submission.zip should contain the following file:

  • answer.csv - having two tab separated columns, first column has the unique ID and second column contains the translation generated by your model. Make sure the text-ids are exactly same for each row as in the original val/test file otherwise you'll get wrong score.

For submission, find sample submission files, script for processing data and baseline model in here. Go to "Submit/View Results" and fill out the necessary details.

Results

Leaderboard is available in the "Results". Top 3 high scorer participant will be highlighted in the green.