
English-Vietnamese Machine Translation using Transformer (Pytorch)

Primary LanguagePython

English-Vietnamese Translation using Transformer

This is our Final Project for the Statistical Learning Course at VNUHCM - University of Science.

You can read our report at this link: Google Drive (detailed explanation about Transformer)

1. Installation

You should have CUDA installed with version 11.x.

Installation commands:

conda create -n trans python=3.9
conda activate trans
bash install.sh

2. Demo Application

Download the trained weights at this link and put it in folder ./runs.

Expected structure:

    |-- <folder_name>/
        |-- config.yaml
        |-- best.pt
        |-- src_field.pt
        |-- trg_field.pt
        |-- ...


streamlit run app.py ./runs/<folder_name>


3. Dataset

We use the dataset from TED Talk (provided at this repo by pbcquoc). You can download the dataset at this link and put it in folder ./data.

Expected structure:

    |-- train.en
    |-- train.vi
    |-- val.en
    |-- val.vi
    |-- test.en
    |-- test.vi

4. Training

You can modify the model architecture, optimizer hyps,... at the config file ./configs/_base_.yaml. Then run the command:

python ./tools/train.py \
    --config_path ./configs/_base_.yaml \
    --device cuda:0

5. Evaluation

With the result folder you get from training process, you can use it to evaluate the model with these command:

python tools/eval.py \
    --runs_path ./runs/2023-06-28_16-57-19 \
    --beam-size 3 \
    --device cuda:0

You can also add the argument --run-train-set to evaluate on set training, but it will take a long time to complete.