-
Download Test Data of English-Spanish Parallel Corpus (The data is from UN)
-
Translate the source language (English) into Target Language (Spanish) using Google Cloud Translator
-
Calculate RIBES score using NLTK
-
Calculate features on each sentence
-
Quality Engineering analysis on the processed data
Input format of csv file is as follow and this should be followed for accurate execution without ERROR.
english | spanish |
---|---|
My name is john. | (Spanish true sentence of Target) |
... | ... |
english | spanish | translated_spanish | ribes_score | number_of_words | number_of_alphabets | noun | adj | verb | adp | conj | height_of_parse_tree |
---|---|---|---|---|---|---|---|---|---|---|---|
My name is john. | (Spanish true sentence of Target) | (Sentence generated by Google Tranlator) | 0.2232 | 3 | ... | ... | ... | ... | ... | ... | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
- RIBES score: Evaluated score of translated sentence on the aspect of quality.
- Feature #n: Features we set, like height of dependency parse tree, in order to verify some relationship among features. These will be used for making ANOVA or Orthogonal Array in DOE or Taguchi method.
Authentication for this service is done via an API Key
. To obtain an API Key:
-
Open the
Cloud Platform Console
-
Make sure that billing is enabled for your project.
-
From the Credentials page, create a new API Key or use an existing one for your project.
-
Set the environmental variable before starting a program like this.
$ export GOOGLE_APPLICATION_CREDENTIALS=path_to_service_account_file
-
Install
pip
andvirtualenv
if you do not already have them. -
Create a virtualenv. Samples are compatible with Python 3.4+.
$ virtualenv -p python3 env
$ source env/bin/activate -
Install the dependencies needed to run the samples.
$ pip install -r requirements.txt
To make an input file for the program, run step 1.
$ python initialize_test.py ./data/es-en.csv 100
This will output a csv file as a form of the format above with the data set named 'es-en.csv'. And last argument '100' means this will have only 100 number of sentences from the data set.
To run main program with csv file ./data/input/sample.csv
:
$ python main.py ./data/es-en.csv
Then it will output ./result/es-en.csv
.
If the file as an argument is not the form of csv, it will print Input file is not a csv file.
.
To run the program with csv file ./data/es-en.csv
:
$ python translator_csv.py ./data/es-en.csv
Then it will output ./result/es-en.csv
.
To run the program with csv file ./data/es-en.csv
which is a file generated on step 2:
$ python calculate_ribes.py ./data/es-en.csv
Then it will output ./result/es-en.csv
.
To run the program with csv file ./data/es-en.csv
which is a file generated on step 3:
$ python calculate_features.py ./data/es-en.csv
Then it will output ./result/es-en.csv
.