/Translatish

Primary LanguageJupyter Notebook

Translatish - Video Translation System

Open Source Love svg1 License: MIT PRs Welcome Maintenance

Translatish - Video Translation System

A Project Report

_ Submitted By: _

Aanshi Patwari(AU1841004) Dipika Pawar(AU1841052) Mayankkumar Tank(AU1841057) Rahul Chocha(AU1841076) Yash Patel(AU1841125)

at

School of Computer Studies(SCS)

Ahmedabad, Gujarat

Table of Contents

1. Introduction 4

1.1. Problem Statement 4

1.2. Project Overview/Specifications 4

1.3. Motivation 4

2. Literature Review 5

2.1. Existing System 5

2.2. Proposed System 6

2.3. Feasibility Study 7

3. System Analysis & Design 9

3.1. Aim 9

3.2. LSTM Architecture details 9

3.2.1. Encoder 9

3.2.2. Decoder 10

3.2.2.1. Decoder Train Mode 10

3.2.2.2. Decoder Inference Mode 11

3.3. Impact of system 12

3.4. Dataset 12

3.5. Preprocessing 12

3.6. Word Embedding 13

3.7. Code Snippets(Algorithm and Pseudocode) 13

3.8. Testing 17

3.9. System Deployment 18

4. Results 19

4.1. Outputs 19

4.2. Working end Product (Web app) 21

4.2.1. Technology Used 21

4.2.1.1. Frontend 21

4.2.1.2. Backend 21

4.2.2. SnapShots of Web App 21

5. Conclusion and Future Work 24

5.1. Conclusion 24

5.2. Future Work 24

6. References 25

6.1. Literature Review 25

6.2. LSTM Architecture 25

7. Appendix 27

7.1. List of Images 27

7.2. List of Tables 27

1. Introduction

1.1. Problem Statement

  • Conversion of videos in English language into our national language( Hindi language) such that it can reach a wider audience of the country.

1.2. Project Overview/Specifications

  • The project proposes the use of neural machine translation techniques to convert the videos in English language into Hindi language.
  • The technique involves the steps as follows:
    • Input : Video in English language
    • Process :
      • Fetch audio from video
      • Convert audio to text
      • Translate english text to hindi text
    • Output: Hindi text

1.3. Motivation

  • India is a multilingual country where more than 1600 languages are spoken belonging to more than 4 different language families; with Hindi being the national language.
  • This beauty of language transcends the boundaries of the different states and cultures in our country. Learning a language other than the mother language is a huge advantage. But a path to multilingualism is a never ending process for most of the people of our country.
  • Majority of the population of our country finds it difficult to communicate in English and understand English language videos. So, translation of such videos can become useful.
  • Furthermore, by converting visual content(i.e. videos) into understandable language(here, hindi) becomes an important step for understanding the relation between visual and linguistic information which are the richest interaction modalities available to humans.

2. Literature Review

2.1. Existing System

  • Rule Based Machine Translation(RBMT)
    • These models are based on the idea of use of available dictionaries and rules of a particular language for translating into another language.
    • These models are prepared manually leading to need of large human task force who can understand both the languages( i.e. linguists)
  • Example Based Machine Translation(EBMT)
    • These models are based on the idea of using the available set of translations to continue the translation from one language to another.
    • These models have the limitations of having a limited number of examples being available.
  • Static Machine Translation(SMT)
    • Word based
      • Comparing similar words in both languages and understanding the pattern by doing the same operation a number of times using the concept of bag-of-words.
      • This model had a limitation of not being able to consider the order of the words within a sentence,
    • Phrase based
      • Comparing similar phrases in both languages and understanding the pattern by doing the same operation a number of times using the n-gram model.
      • This model also had limitations of not being able to consider the order of phrases within a sentence.
    • Syntax based
      • Use of the subject, the predicate, other parts of the sentence, and then to build a sentence tree using language syntax along with rest of word and phrase translation for translation from one language to another.
      • Complexity of model increased with large amount of dataset consisting of numerous sentences.
  • Neural Machine Translation(NMT)
    • Recurrent Neural Networks(RNN)
      • RNN are used to reorder the source to target language translation by the help of semi-supervised learning methods.
      • Word embedding is used to generate the vector value of the words which can be used for translation purposes.
      • RNN are based on the idea of binary tree structure for mapping the word in the target language with the vector values of the source language.
      • This model has a complex structure which in turn leads to more number of computations.
    • Long Short-Term Memory Networks(LSTM)
      • LSTM works on the idea of sequence by sequence learning.
      • It uses a NMT model which has eight layers of encoders and decoders within a deep neural network.
      • It is also known as Bidirectional RNN because of the use of encoder and decoder structure.
      • Encoder-Decoder structures are jointly trained to maximize the conditional probabilities of the target sequence given the input sequence.

2.2. Proposed System

  • LSTM model
    • This model is a special kind of RNN capable of learning long-term dependencies.
    • The model mainly consists of steps:
      • Decide the information to pass through the cell state
      • Decide what information to store in the cell state
      • Update the cell state by adding new information and discarding the information no longer needed
      • Predict the output based on the information

Figure 1-LSTM architecture.

2.3. Feasibility Study

  • Comparison between RNN and LSTM
    • Gated Recurrent Units(GRU) gating values are close to LSTM values with lesser run time in comparison to RNN.
    • The LSTM layer architecture is built in a way such that the network "decides" whether to modify its "internal memory" or not, at each step.
    • Doing so, and if properly trained, then the layer can keep track of important events from further in the past, allowing for much richer inference.
    • LSTM networks were made with the purpose to solve long term dependency problems that traditional RNNs have.
    • LSTM networks are specially made to solve long-term dependency problems in RNNs and they are super good at making use of data from a while ago (inherent in its nature) by using cell states and RNN has a complex structure which in turn has more computations therefore making LSTM more preferred over RNN.

3. System Analysis & Design

3.1. Aim

To implement a model which can perform word-by-word translation of a given english sentence into hindi with accurate results.

3.2. LSTM Architecture details

  • The LSTM reads data one sequence after the other.
  • If the input is of length 'k' then it will read it ask time-stamps
  • The main aim is to bring long and short term dependencies
  • The whole system is simplified by 3 Gates and 1 state:
    • Forget Gate: This gate will decide that information should be passed from previous state (h(t-1)) to next state (h(t)) or not
    • Input Gate: This gate deals with the data to update the cell states
    • Output Gate: This decides the information that is relevant to next hidden state
    • Cell state: The pointwise addition of the output of Forget and Input gate updates cell state
  • Input tokens are represented by {x1,x2,...,xk}, hidden states by {h1,h2,...,hk}, cell states by {c1,c2,...,c3} and output token by {y1,y2…,yk}.

LSTM consists of 3 neural network architectures.

  • Encoder
  • Decoder Train Mode
  • Decoder Inference Mode

3.2.1. Encoder

  • Encoder basically summaries the whole sentence
  • The inputs for the encoder are feature vectors of each word of the sentence which is shown in the figure
  • After embedding inputs to the encoder, it will generates hidden and cell states at each time and gives final outputs y1, y2, ..yk
  • In this project, the hidden and cell states are only stored and outputs are discarded because the states are helpful to store the summarized sequence till the previous state. For this application the states are only needed

Figure 2-Encoder Model.

3.2.2. Decoder

3.2.2.1. Decoder Train Mode

  • This accepts the final states of encoder as the input states
  • Then getting word start_ and it starts generate word
  • At the next instance, We give actual Hindi translation words as input form training labels and it will again predict the next word (Shown in the below figure).
  • This uses "Teacher forcing method"

Figure 3-Decoder Train M odel.

3.2.2.2. Decoder Inference Mode

  • In this application we need slightly different architecture for output testing Decoder.
  • As we don't have labels for testing purposes, we need to pass the predicted word to the next timestamp's input (Shown in the below figure).

Figure 4- Decoder Inference Model.

3.3. Impact of system

  • Video marketing can be even more effective overseas.
  • Translation helps in cutting across language barriers and interacting with people in such countries.

3.4. Dataset

  • Bilingual corpora
  • Part of parallel corpora which contains multiple corporas.
  • Taken from about 37726 TED talks, news articles, Wikipedia articles.
  • Has unique 1,24,318 english records and 9,7662 hindi records
  • Corpus link: HindiEnglishCorpora
  • Cleaned dataset
  • It is sentence tokenized (sentence aligned)

Figure 5-Dataset Sample.

3.5. Preprocessing

  • Removal of duplicate and null records
  • Conversion to lowercase
  • Removal of Quotes, Special characters, extra spaces
  • Computing the length of english and corresponding Hindi sentence, append it in the dataset
  • Dictionary of all the words of Hindi and English
  • Adding 'START_' and '_END' tokens to the Hindi labels, such that decoder can understand starting and end of the sentences

3.6. Word Embedding

  • For feature vector representation, there are many techniques like word embedding, word2vect, etc.
  • Here word embedding is used to represent word features and it can be implemented using Keras API in python
  • In embedding, there are certain features which are extracted automatically from the text. In which if the number of words is n and the number of features is m then the matrix will be the size of mxn

Figure 6-Word Embedding Sample.

  • In the above example, n=10K and m=50.
  • The features are like gender, royal, kind etc.
  • For word which having any particular feature has High value in the matrix
  • As King, Queen, Man, Woman all can relate to Gender. So their corresponding values are High but Man is not related to Royal. So that value is near to zero

3.7. Code Snippets(Algorithm and Pseudocode)

  • Pre-processing

Figure 7 -Preprocessing Code Snippet.

  • Batch Generation:

Figure 8-Batch Generation Pseudocode.

Figure 9-Batch G eneration Code Snippet.

  • Model Building

Figure 10-M odel Building Code Snippet.

  • Model Summary

Figure 11-Model Summary.

  • BLEU score computation
    • For n-gram precision calculation:

Figure 12-n gram precision pseudocode.

  • Cumulative n-gram score:

Figure 13-Cumulative n-gram score pseudocode_._

Figure 14-BLEUscore code snippet.

3.8. Testing

For testing the applications like Neural Machine Translation, BLEU algorithm is used

BLEU score:

  • Bilingual evaluation understudy
  • Number between 0-1
  • The number indicates how much similar the predicted text is to reference text
  • 1 indicates the maximum similarity
  • It can be using Unigram, bigram,... or the combination of any(Here 2 and 4-gram cumulatives are used)
  • The algorithm is mentioned above

Example: Reference 1 : The cat is on the mat.

Reference 2 : There is a cat on the mat.

Predicted Output : The cat the cat on the mat. (Target)

Bigram BLEU Score Computation

Possible Bigrams Frequency in Target Sentence Actual frequency from the reference sentences
the cat 2 1
cat the 1 0
cat on 1 1
on the 1 1
the mat 1 1

Table 1-BL EU score computation example table.

  • Our total count is 2+1+1+1+1=6
  • Actual count is 1+0+1+1+1=4
  • Bleu score = count/total = 4/6

3.9. System Deployment

As the system is divided into steps:

  • Uploading the video in English language
  • Video to audio conversion
  • Audio to text conversion
  • English text to Hindi translation
  • Displaying the Translation

4. Results

4.1. Outputs

  • Preprocessing Results

Figure 15-Preprocessing output.

  • Model Training
    • After training the model for 50 epochs, it starts to converge which suggests that the training should be done till 50 epochs.

Figure 16-Trained M odel Output.

  • Model Validation
    • After 30 epochs the model stops learning which suggests to stop the training of the model at 30 epochs.

Figure 17-Validation Curve output.

  • Model Testing

Figure 18-Te__st Example1.

Here, the model is able to predict the hindi translation for the given english sentence exactly similar to the actual hindi translation.

Figure 19-Test Example2.

Here, the model is able to predict the hindi translation for the given hindi translation with a bit difference with the actual hindi translation due to difference in semantics and syntactics of the sentence

  • Bleu Score
Sr. no. Dataset Name Description BLEU Score
1. HindiEnglishCorpora Contains TEDtalks, news articles and Wikipedia articles 53.01%
2. Machine-Translation-English-To-Hindi Contains day-to-day routinely used sentences 36.9%
3. IITBombay English-Hindi Corpus Dataset Contains Indian judicial statements and their corresponding translated sentences 12%

Table 2-Test Dataset Table.

4.2. Working end Product (Web app)

4.2.1. Technology Used

4.2.1.1. Frontend

  • HTML
  • CSS
  • React JS

4.2.1.2. Backend

  • Django
  • Django Rest-frameworks
  • Python
  • Tensorflow
  • Keras
  • Audiopy, pydub

4.2.2. SnapShots of Web App

  • Welcome (Intro) Screen :

This is the intro screen when you reach out to our web app.

Figure 20-Front Screen.

  • About Us :

About us, the screen includes the details of the project member with their profile pictures. (Meet the Team)

Figure 21-About us Screen_._

  • Explore Page :

This is the main screen where you can do your desired work (video translation). First snapshot showing the page without uploading the video. There you can see that there is an option of uploading the video with click or drag and drop facility. Talking about the second snapshot where video is uploaded and there you can play the video. And at the end of the page you can see two divisions one shows the extracted english sentences (converted from the video). And Another one shows the translated hindi text of the video.

Figure 22-Explore Screen output 1.

Figure 23-Explore Screen output 2.

5. Conclusion and Future Work

5.1. Conclusion

  • The implementation of LSTM model for the purpose of video translation i.e. neural machine translation gave accurate results for videos similar to the training dataset
  • There was a bit of fluctuation in the results for translation of videos different from the training dataset type.

5.2. Future Work

  • Implementation of proposed model on different datasets
  • Try out translation into other languages
  • Train the model on dataset for variety of sentences from different videos
  • Conversion of the translated sentences into video format
  • Improve the model performance by implementing it on super computers which can provide more efficient results and computation power
  • Change the speech to text conversion strategy.

6. References

6.1. Literature Review

[1] Translartisan. (2018, August 31). Rule-based machine translation. Retrieved from https://translartisan.wordpress.com/tag/rule-based-machine-translation/

[2] A., P., & P. (2018). The IIT Bombay English-Hindi Parallel Corpus. 1-4. Retrieved May 19, 2018, from https://arxiv.org/pdf/1710.02855.pdf.

[3] Saini, S., & Sahula, V. (2018). Neural Machine Translation for English to Hindi. 2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP). doi:10.1109/infrkm.2018.8464781

[4] S. P. Singh, A. Kumar, H. Darbari, L. Singh, A. Rastogi and S. Jain, "Machine translation using deep learning: An overview," 2017 International Conference on Computer, Communications and Electronics (Comptelix), 2017, pp. 162-167, doi: 10.1109/COMPTELIX.2017.8003957.

6.2. LSTM Architecture

[5] Lamba, H. (2019, February 17). Word Level English to Marathi Neural Machine Translation using Seq2Seq Encoder-Decoder LSTM Model. Retrieved from https://towardsdatascience.com/word-level-english-to-marathi-neural-machine-translation-using-seq2seq-encoder-decoder-lstm-model-1a913f2dc4a7

[6] Ranjan, R. (2020, January 16). Neural Machine Translation for Hindi-English: Sequence to sequence learning. Retrieved from https://medium.com/analytics-vidhya/neural-machine-translation-for-hindi-english-sequence-to-sequence-learning-1298655e334a

[7] V, B. (2020, November 16). A Comprehensive Guide to Neural Machine Translation using Seq2Sequence Modelling using PyTorch. Retrieved from https://towardsdatascience.com/a-comprehensive-guide-to-neural-machine-translation-using-seq2sequence-modelling-using-pytorch-41c9b84ba350

[8] Understanding LSTM Networks. (n.d.). Retrieved from https://colah.github.io/posts/2015-08-Understanding-LSTMs/

[9] Pedamallu, H. (2020, November 30). RNN vs GRU vs LSTM. Retrieved from https://medium.com/analytics-vidhya/rnn-vs-gru-vs-lstm-863b0b7b1573

[10] Mittal, A. (2019, October 12). Understanding RNN and LSTM. Retrieved from https://aditi-mittal.medium.com/understanding-rnn-and-lstm-f7cdf6dfc14e

[11] Culurciello, E. (2019, January 10). The fall of RNN / LSTM. Retrieved from https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0

7. Appendix

7.1. List of Images

|

7.2. List of Tables

|

Page - 24 of 27

Stargazers repo roster for @Translatish/Translatish

Forkers repo roster for @Translatish/Translatish