/attention-is-all-you-need-keras

A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need

Primary LanguagePython

The Transformer model in Attention is all you need:a Keras implementation.

A Keras+TensorFlow Implementation of the Transformer: "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017)

Usage

Please refer to en2de_main.py and pinyin_main.py

en2de_main.py

Results

  • The code achieves near results as in the repository: about 70% valid accuracy. If using smaller model parameters, such as layers=2 and d_model=256, the valid accuracy is better since the task is quite small.

For your own data

  • Just preprocess your source and target sequences as the format in en2de.s2s.txt and pinyin.corpus.examples.txt.

Some notes

  • For larger number of layers, the special learning rate scheduler reported in the papar is necessary.
  • In pinyin_main.py, I tried another method to train the deep network. I train the first layer and the embedding layer first, then train a 2-layers model, and then train a 3-layers, etc. It works in this task.

Upgrades

  • Reconstruct some classes.
  • It is easier to use the components in other models, just import transformer.py
  • A fast step-by-step decoder is added, including an upgraded beam-search. But they should be modified to be reuseable.
  • Updated for tensorflow 2.6.0

Acknowledgement