English to Japanese Translator by pytorch 🙊 (Transformer from scratch)
- English to Japanese translator by Pytorch.
- The neural network architecture is Transformer.
- The layers for Transfomer are implemented from scratch by pytorch. (you can find them under layers/transformer/)
- Parallel corpus(dataset) is kftt.
-
Transformer is a neural network model proposed in the paper ‘Attention Is All You Need’
-
As the paper's title said, transformer is a model based on Attention mechanism. Transformer does not use recursive calculation when training like RNN,LSTM
-
Many of the models that have achieved high accuracy in various tasks in the NLP domain in recent years, such as BERT, GPT-3, and XLNet, have a Transformer-based structure.
Install dependencies & create a virtual environment in project by running:
$ poetry install
set PYTHONPATH
export PYTHONPATH="$(pwd)"
Download & unzip parallel corpus(kftt) by running:
$ poetry run python ./utils/download.py
The directory structure is as below.
.
├── const
│ └── path.py
├── corpus
│ └── kftt-data-1.0
├── figure
├── layers
│ └── transformer
│ ├── Embedding.py
│ ├── FFN.py
│ ├── MultiHeadAttention.py
│ ├── PositionalEncoding.py
│ ├── ScaledDotProductAttention.py
│ ├── TransformerDecoder.py
│ └── TransformerEncoder.py
├── models
│ ├── Transformer.py
│ └── __init__.py
├── mypy.ini
├── pickles
│ └── nn/
├── poetry.lock
├── poetry.toml
├── pyproject.toml
├── tests
│ ├── conftest.py
│ ├── layers/
│ ├── models/
│ └── utils/
├── train.py
└── utils
├── dataset/
├── download.py
├── evaluation/
└── text/
You can train model by running:
$ poetry run python train.py
epoch: 1
--------------------Train--------------------
train loss: 10.104473114013672, bleu score: 0.0,iter: 1/4403
train loss: 9.551202774047852, bleu score: 0.0,iter: 2/4403
train loss: 8.950608253479004, bleu score: 0.0,iter: 3/4403
train loss: 8.688143730163574, bleu score: 0.0,iter: 4/4403
train loss: 8.4220552444458, bleu score: 0.0,iter: 5/4403
train loss: 8.243291854858398, bleu score: 0.0,iter: 6/4403
train loss: 8.187620162963867, bleu score: 0.0,iter: 7/4403
train loss: 7.6360859870910645, bleu score: 0.0,iter: 8/4403
....
- For each epoch, the model at that point is saved under pickles/nn/
- When the training is finished, loss.png is saved under figure/