/Zabaan

Prototype Translation system using Neural Machine Translation architectures

Primary LanguageJavaScriptBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Language Translation using Zabaan

Zabaan is an Urdu word which means Dialect, tongue, or form of speech

                 

Zabaan is a prototype translation platform that uses a hybrid approach of human-in-the-loop (HITL) and Neural Machine Translation (NMT) techniques to suggest domain specific translations of fire, electrical, and life safety texts. Zabaan was extensively trained on NFPA datasets like Code & Standards, Research, and Public Education & Outreach material and currently focused on bilateral English (EN) and Spanish (ES) translations. Zabaan was developed by NFPA's Data Analytics team in collaboration NFPA's Internatinal Operations team with support from WPI's GQP Program. The platform come with a lightweigh UI front end has options to get instant translations and edit incorrect ones suggested by the NMT machine.

Built With

  • OpenNMT-tf - A general purpose sequence learning toolkit using TensorFlow
  • Tensorflow Serving - A flexible, high-performance serving system for machine learning models
  • Tornado Web Framework - A Python web framework and asynchronous networking library
  • MongoDB - A document-based, distributed database built for modern application developers

Getting Started

I. Project Setup

  1. Clone the Repository
git clone https://github.com/NFPA/Zabaan.git
cd Zabaan/Serving
  1. Activate python 3.6 environment (Assuming your using the EC2 Instance with Deep Learning AMI)
source activate tensorflow_p36
  1. Install packages This installs the python packages for Tokenization, TFServing API 1.X, PyMongo
pip install -r requirements.txt
  1. Get a Latest MongoDB docker image to the machine and map to a directory. Change the volumn path and port number accordingly.
mkdir mongodb
docker run --name gqp-mongo -d -v /home/ubuntu/Zabaan/Serving/mongodb:/data -p 27017:27017 mongo:latest

Once you start/stop the MongoDB docker image, for next time just start with container ID/name, no need to download the image again.

docker start <container_name/id>
  1. Copy all the serving models into models folder. You can download the EuroParl model and NFPA model.
cp /home/ubuntu/demo/models/* /home/ubuntu/Zabaan/Serving/nfpa_models/

We have trained all the models in OpenNMT-tf format. For more details on OpenNMT-tf Saved Model format and Creating/Serving OpenNMT Models. Please see OpenNMT Serving

For more details on Serving tensorflow models. Please see Tensorflow Serving

  1. Check the model.config so it has the required configuration of the model you want to serve.
config: {
    name: "name_of_the_model",
    base_path: "/realtive/path/to/model",
    model_platform: "tensorflow"
  }
  1. With the MongoDB docker started, Start a Tensorflow Serving GPU instance in the background. Note: Change the source path accordingly, put your absolute path here. After you start the docker image, you can use to check if success.
nvidia-docker run --name tf_server -d --rm -p 8500:8500 --mount type=bind,source=/home/ubuntu/Zabaan/Serving/nfpa_models/,target=/models/nfpa_models -t tensorflow/serving:1.11.0-gpu --model_config_file=/models/nfpa_models/models.config

Verify TF Server started using docker log command:

 docker container logs tf_server

It should give you something like below at the end of log file:

2020-11-16 20:50:35.246427: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: euro_attention version: 1564872567}
2020-11-16 20:50:35.251353: I tensorflow_serving/model_servers/server.cc:285] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-11-16 20:50:35.255347: I tensorflow_serving/model_servers/server.cc:301] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 235] RAW: Entering the event loop ...
  1. Start the server, the mapped endpoints in this file call the requires functions and models from client file.
python server.py --port 8500 --model_name euro_attention
  1. Application should be running on localhost:8080

Results

BLEU Scores on NFPA Content, before and after domain adaption to NFPA data.

En-Es Es-En No. of Sentences (Train/Dev/Test)
Before Domain Adaption 35.98 41.3 1.7M / 1000 / 500
After Domain Adaption 65.89 73.25 93k / 1000 / 1000

Acknowledgments