/google-translator-performance-analyzer

Base Code to Analyze the Performance of Google Translate using BLEU, Dependence Parse Tree, two-way ANOVA and other Quality Engineering Theories

Primary LanguagePython

Quality Engineering Term Project
- Quality Evalutation of Google Translator

Analysis Flow and Development Guide

  1. Download Test Data of English-Spanish Parallel Corpus (The data is from UN)

  2. Translate the source language (English) into Target Language (Spanish) using Google Cloud Translator

  3. Calculate RIBES score using NLTK

  4. Calculate features on each sentence

  5. Quality Engineering analysis on the processed data

Input Form of csv file

Input format of csv file is as follow and this should be followed for accurate execution without ERROR.

english spanish
My name is john. (Spanish true sentence of Target)
... ...

Ouput Form of csv file

english spanish translated_spanish ribes_score number_of_words number_of_alphabets noun adj verb adp conj height_of_parse_tree
My name is john. (Spanish true sentence of Target) (Sentence generated by Google Tranlator) 0.2232 3 ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
  • RIBES score: Evaluated score of translated sentence on the aspect of quality.
  • Feature #n: Features we set, like height of dependency parse tree, in order to verify some relationship among features. These will be used for making ANOVA or Orthogonal Array in DOE or Taguchi method.

Setup

Authentication

Authentication for this service is done via an API Key. To obtain an API Key:

  1. Open the Cloud Platform Console

  2. Make sure that billing is enabled for your project.

  3. From the Credentials page, create a new API Key or use an existing one for your project.

  4. Set the environmental variable before starting a program like this.

    $ export GOOGLE_APPLICATION_CREDENTIALS=path_to_service_account_file

Install Dependencies

  1. Install pip and virtualenv if you do not already have them.

  2. Create a virtualenv. Samples are compatible with Python 3.4+.

    $ virtualenv -p python3 env
    $ source env/bin/activate

  3. Install the dependencies needed to run the samples.

    $ pip install -r requirements.txt

Samples

For step 1,

To make an input file for the program, run step 1.

$ python initialize_test.py ./data/es-en.csv 100

This will output a csv file as a form of the format above with the data set named 'es-en.csv'. And last argument '100' means this will have only 100 number of sentences from the data set.

Main Function including steps 2, 3, 4

To run main program with csv file ./data/input/sample.csv:

$ python main.py ./data/es-en.csv

Then it will output ./result/es-en.csv.

If the file as an argument is not the form of csv, it will print Input file is not a csv file..

For step 2,

To run the program with csv file ./data/es-en.csv:

 $ python translator_csv.py ./data/es-en.csv

Then it will output ./result/es-en.csv.

For step 3,

To run the program with csv file ./data/es-en.csv which is a file generated on step 2:

 $ python calculate_ribes.py ./data/es-en.csv

Then it will output ./result/es-en.csv.

For step 4,

To run the program with csv file ./data/es-en.csv which is a file generated on step 3:

 $ python calculate_features.py ./data/es-en.csv

Then it will output ./result/es-en.csv.

References