/Neural_Machine_Translator_seq2seq

Neural Machine Translator for translating from english to hindi text. Used Pytorch framework with seq2seq architecture having Attention functionality

Primary LanguageJupyter NotebookMIT LicenseMIT

Neural_Machine_Translator_seq2seq

Neural Machine Translator (NMT) for translating from english to hindi text. Used Pytorch framework with seq2seq architecture having Attention functionality .
The Jupyter Notebook given in this repository is self explanatory and well documented.

Dependencies

Pytorch == 0.3.0
Numpy == 1.14.2

This blog explains NMT really well !

Dataset

There are various sources from where you can download the eng-hind.txt parallel corpus : -

  1. IIT-Bombay Dataset
  2. HindEnCorp 0.5
  3. Indian parallel Corpora

The dataset file should be a tab seperated file having text in the following way -
I am cold.                मुझे ठंड लग रही है।
My name is yash    मेरा नाम यश है
.                                .
.                                .

The jupyter notebook given here is for educational purpose, and if you wish to see some good results then I would highly recommend you to git clone one of the following repositories -

1.Stanford NMT [Matlab]
2.tf-seq2seq [TensorFlow]
3.Nemantus [Theano]
4.OpenNMT [Torch with Lua Language]---> Highly recommended, incorporates all the functionalities
5.OpenNMT-py [PyTorch]

Papers

A Statistical Approach to Machine Translation, 1990.
Review Article: Example-based Machine Translation, 1999.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, 2014.
Neural Machine Translation by Jointly Learning to Align and Translate, 2014.
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, 2016.
Sequence to sequence learning with neural networks, 2014.
Recurrent Continuous Translation Models, 2013.
Continuous space translation models for phrase-based statistical machine translation, 2013.

Acknowledgements

A big Thank you to the whole team of Messy Fractals, especially Dhanya P and Arvind Sivdas for letting me work under them, for this project .

References

The credits for this code go to the user spro. I have merely made some changes in it for dealing with Hindi text.