Introducing Sparsity in the Transformer model (Keras Implementation)

A proof of concept implementation of evolutionary sparsity in the Transformer model architecture.

How To Run:

Sparse Variant of Transformer

Sparse variant architecture, trained on the original data (29.000 samples in training set, 1024 samples in test set)

python3 en2de_main.py sparse origdata

Original Transformer

*Original architecture with a rewritten trainingsloop and using custom transfer-function in order to validate the obtained results *

python3 en2de_main.py originalWithTransfer origdata

Flags:

load_existing_model

Loads the saved model from previous training epochs and continues training this model

Datasets

sets the dataset to be used for the trainings-task

  • 'origdata': Use the WMT 2016 German-to-English dataset for training
  • 'testdata': Use a very small subset of the original trainings-task

Research papers:

Code based on / uses parts of:

F.A.Q

  • The test sys argument gives me error: UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 6: ordinal not in range(128).
    Solution: run in terminal: export LC_CTYPE=C.UTF-8