tm4roon/pytorch-xlm

An implementation of cross-lingual language model pre-training (XLM).

Python

XLM: Cross-lingual Language Model Pretraining

An implementation of Cross-lingual Language Model Pretraining (XLM) using pytorch. You can choose following three training models.

Causal language model ( -—task causal)
Masked language model ( -—task masked)
Translation language model ( -—task translation)

Settings

This code are depend on the following.

python==3.6.5
pytorch==1.1.0
torchtext==0.3.1

git clone https://github.com/t080/pytorch-xlm.git
cd ./pytorch-xlm
pip install -r requirements.txt

Usages

When a causal language model or a masked language model are trained, you must give a monolingual corpus (.txt) to the --train option.

python train.py \
  --task causal (or masked) \
  --train /path/to/train.txt \
  --savedir ./checkpoints \
  --gpu

When a translation language model is trained, you must give a parallel corpus (.tsv) to the --train option.

python train.py \
  --task translation \
  --train /path/to/train.tsv \
  --savedir ./checkpoints \
  --gpu

References

Lample, Guillaume, and Alexis Conneau. "Cross-lingual language model pretraining." arXiv preprint arXiv:1901.07291 (2019).