/gluon-nlp-1

Code repo for "Language Models with Transformers" paper

Primary LanguagePython

ptb wiki2 wiki103

Language Models with Transformers

Reference: C Wang, M Li, A Smola. "Language Models with Transformers". arXiv preprint arXiv:1904.09408 (2019).

Installation

pip install --pre --upgrade mxnet
pip install gluonnlp

Results

The datasets used for training the models are wikitext-2 and wikitext-103 respectively.

The key features used to reproduce the results on wikitext-2 based on the corresponding pre-trained models are listed in the following tables.

Model bert_lm_12_768_12_300_1150_wikitext2 bert_lm_24_1024_16_300_1150_wikitext2
Val PPL 38.43 37.79
Test PPL 34.64 34.11
Command [1] [2]
Result logs log log

[1] bert_lm_12_768_12_300_1150_wikitext2 (Val PPL 38.43 Test PPL 34.64)

$ cd scripts/language_model
$ python transformer_language_model.py --data wikitext2 --model bert_lm_12_768_12_300_1150 --val_batch_size 8 --test_batch_size 8 --bptt 128 --seed 1882 --batch_size 16 --gpus 0

[2] bert_lm_24_1024_16_300_1150_wikitext2 (Val PPL 37.79 Test PPL 34.11)

$ cd scripts/language_model
$ python transformer_language_model.py --data wikitext2 --model bert_lm_24_1024_16_300_1150 --val_batch_size 8 --test_batch_size 8 --bptt 128 --seed 1882 --batch_size 16 --gpus 0

The key features used to reproduce the results on wikitext-103 based on the corresponding pre-trained models are listed in the following tables.

Model bert_lm_12_768_12_400_2500_wikitext103 bert_lm_24_1024_16_400_2500_wikitext103
Val PPL 40.70 20.33
Test PPL 39.85 20.54
Command [1] [2]
Result logs log log

[1] bert_lm_12_768_12_400_2500_wikitext103 (Val PPL 40.70 Test PPL 39.85)

$ cd scripts/language_model
$ python transformer_language_model.py --data wikitext103 --model bert_lm_12_768_12_400_2500 --val_batch_size 8 --test_batch_size 8 --bptt 64 --seed 1111 --batch_size 20 --gpus 0

[2] bert_lm_24_1024_16_400_2500_wikitext103 (Val PPL 20.33 Test PPL 20.54)

$ cd scripts/language_model
$ python transformer_language_model.py --data wikitext103 --model bert_lm_24_1024_16_400_2500 --val_batch_size 8 --test_batch_size 8 --bptt 64 --seed 1111 --batch_size 12 --gpus 0

Note that the corresponding multi-gpu evaluations are also supported. The pre-trained model bert_lm_24_1024_16_400_2500_wikitext103 would be updated soon.

Reference Paper

The bibtext entry of the reference paper is:

@article{lmtransformer2019,
   title={Language Models with Transformers},
   author={Chenguang Wang and Mu Li and Alexander J. Smola},
   journal={ArXiv},
   year={2019},
   volume={abs/1904.09408}
}