/YoungNMT

Young Neural Machine Translation Framework

Primary LanguagePythonApache License 2.0Apache-2.0



Apache License 2.0 Latest Release Documentation


YoungNMT is a young but low coupling, flexible and scalable neural machine translation system. The system is designed for researchers and developers to realize their ideas quickly without changing the original system.


Documentation

2020.10.10

Version 0.1.0 has some bugs (but these bugs do not affect normal use of YoungNMT):

  • loading exception of user define hocon files;
  • logging exception of BLEU scorer.

Table of Contents

Dependencies

Required

It's better to configure and install the following dependency packages by the user:

The following dependency packages will be installed automatically during system installation. If there are errors, please configure them manually.

  • pyhocon is used to parse configuration files.
  • visdom is used to visualize training process.

Optional

  • NCCL is used to train models on NVIDIA GPU.
  • apex is used to train models with mixed precision.
  • pynvml is used to manage and monitor NVIDIA GPU.

Installation

Three different installation methods are shown bellow:

  1. Install YoungNMT from PyPI:
pip install YoungNMT
  1. Install YoungNMT from sources:
git clone https://github.com/Jason-Young-AI/YoungNMT.git
cd YoungNMT
python setup.py install
  1. Develop YoungNMT locally:
git clone https://github.com/Jason-Young-AI/YoungNMT.git
cd YoungNMT
python setup.py build develop

Arguments

In YoungNMT, we built a module, which is a encapsulation of pyhocon, that parses files which are wrote in a HOCON style to obtain arguments of system. HOCON (Human-Optimized Config Object Notation) is a superset of JSON. So YoungNMT can load arguments from *.json or pure HOCON files.

After installation, the commonds ynmt-preprocess, ynmt-train and ynmt-test can be excuted directly and system arguments will be loaded from default HOCON files.

Save Arguments

ynmt-preprocess -s {path to save args} -t {json|yaml|properties|hocon}

Load Arguments

ynmt-preprocess -l {user's config file}

Quickstart

See Full Documentation for more details.

Here is an example of the WMT16 English to Romania experiment.

Step 0. preliminaries

  • Download English-Romania corpora directory from OneDrive;
  • Download English-Romania configuration file from YoungNMT-configs

Step 1. Dataset preparation

unzip -d Corpora English-Romania.zip
mkdir Datasets
ynmt-preprocess -l  wmt16_en-ro_config/main.hocon

Step 2. Train the model on 4 GPU

mkdir -p Checkpoints/WMT16_En-Ro
CUDA_VISIBLE_DEVICES=0,1,2,3 ynmt-train -l wmt16_en-ro_config/main.hocon

Step 3. Test the model using 1 GPU

mkdir -p Outputs/WMT16_En-Ro
CUDA_VISIBLE_DEVICES=0 ynmt-test -l wmt16_en-ro_config/main.hocon

Models and Configurations

We provide pre-trained models and its configurations for several tasks. Please refer to YoungNMT-configs.

Citation