/Neural-Machine-Translation_Transliteration

An Intelligent Approach for Translation / Transliteration using Neural Networks

Primary LanguageJupyter Notebook

Neural-Machine-Translation_Transliteration

An Intelligent Approach for Translation / Transliteration using Neural Networks

This translation approach is based on Recurrent Neural Networks (RNNs) which are the type of Neural Networks to be used when dealing with sequences of input like videos, sound or text like in our case.

RNNs

For the data, I used the bible-corpus, you have to download the corresponding raw XML files and place them in the directory (data/bible-corpus/raw/) then extract the text from these files : you can use the Jupyter Notebook (word-character embedding/XMLparser.ipynb) to help you in this task, then save the results in the directory (data/bible-corpus/pre-processed/) and finaly run the script (createEmbeddings.sh) to generate the embeddings in the directory (data/bible-corpus/processed/).

By the way, I used Fasttext for the embeddings.

The script (word-character embedding/getEmbedding.py) reads a word or a character from the user and checks if the embedding is already saved in the SQLite database (word-character embedding/embeddingDB.db), otherwise, it computes it using Fasttext even if it's not found in the training corpus! in this case, it will generate the closest embedding based on the word's characters.

The Jupyter Notebook translate_dev.ipynb explains the whole pipeline which starts by reading in the training data, tokenization, embedding then building and training the model.