/Pytorch-Torchtext-Seq2Seq

Pytorch and Torchtext implementation of Sequence to sequence

Primary LanguagePython

Pytorch-Torchtext-Seq2Seq

Pytorch implementation of Neural Machine Translation by Jointly Learning to Align and Translate.

Prerequisites

Getting Started

1. Clone the repository

$ git clone https://github.com/Mjkim88/Pytorch-Torchtext-Seq2Seq.git
$ cd Pytorch-Torchtext-Seq2Seq

2. Download the dataset

$ bash download.sh

This commands will download Europarl v7 and dev datasets to data/ folder. If you want to use other datasets, you don't need to run this command.

3. Train the model

$ python main.py --dataset 'europarl' --src_lang 'fr' --trg_lang 'en' --data_path './data' \
                 --train_path './data/training/europarl-v7.fr-en' --val_path './data/dev/newstest2013' \
                 --log log --sample sample

If you initially run the above command, the model starts from preprocessing data using Torchtext and automatically saves the preprocessed JSON file to /data, so that it avoids preprocessing the same datasets again.

(Optional) Tensorboard visualization

$ tensorboard --logdir='./logs' --port=8888

For the tensorboard visualization, open the new terminal and run the command below and open http://localhost:8888 on your web browser.