/Transformer-4-Translation

'Attention Is All You Need' Paper Implementation, and Training for Language Translation on OPUS-Books Dataset

Primary LanguagePython

'Attention Is All You Need' Paper Implementation

Link to paper: https://arxiv.org/pdf/1706.03762]

Architecture

The model is implemented in model.py and consists of the following components:

  • Input Embeddings: Converts input tokens into dense vectors.
  • Positional Encoding: Adds information about the position of tokens in the sequence.
  • Scaled Multi-Head Attention: Implements self-attention and cross-attention mechanisms.
  • Feed Forward Block: Processes the output of the attention layer.
  • Encoder and Decoder Stacks: Stacks of layers for encoding and decoding sequences.
  • Output Projection Layer: Projects the decoder output to the vocabulary size for prediction.

Configuration

The configuration for the model training and parameters is specified in config.py. Key parameters include:

  • batch_size: Number of samples per gradient update.
  • num_epochs: Total number of training epochs.
  • lr: Learning rate for the optimizer.
  • seq_len: Maximum sequence length for input sentences.
  • d_model: Dimensionality of the model's output.
  • datasource: Name of the dataset being used (e.g., opus_books).
  • lang_src: Source language code (e.g., "en" for English).
  • lang_tgt: Target language code (e.g., "es" for Spanish).

Dataset

The dataset used is the OPUS Books dataset, which is a collection of bilingual texts aligned across multiple languages. It provides a rich resource for training translation models, featuring copyright-free books in various languages.

Training

To train the model, ensure you have the dataset and tokenizers ready. Run the training script:

python train.py