Pytorch Transformer

This is my implementation of Transformer using Pytorch from scratch, going through the Encoder-Decoder structure.

Here is the structure of it when we go through the details:

A. Some emergent things to take note

We can obserse that this structure includes:

Transformer Block - which is the main component of the Encoder.
Decoder Block - which includes the Masked Multi-Head Attention and the Transformer Block.
Multi-Head Attention and Masked Multi-Head Attention - this takes into account the importance of each token to others, making this structure surpasses normal LSTMs or RNNs
Positional Embedding - this takes into account the position of each token, making this structure surpasses normal LSTMs or RNNs.

B. In case you want to try to implement this on your local computer

To clone this repository, run the following command:

git clone https://github.com/phanng0605/PytorchTransformer.git

Install required packages:

pip install -r requirements.txt