/pytorch-xlm

An implementation of cross-lingual language model pre-training (XLM).

Primary LanguagePython

XLM: Cross-lingual Language Model Pretraining

An implementation of Cross-lingual Language Model Pretraining (XLM) using pytorch. You can choose following three training models.

  • Causal language model ( -—task causal)
  • Masked language model ( -—task masked)
  • Translation language model ( -—task translation)

Settings

This code are depend on the following.

  • python==3.6.5
  • pytorch==1.1.0
  • torchtext==0.3.1
git clone https://github.com/t080/pytorch-xlm.git
cd ./pytorch-xlm
pip install -r requirements.txt

Usages

When a causal language model or a masked language model are trained, you must give a monolingual corpus (.txt) to the --train option.

python train.py \
  --task causal (or masked) \
  --train /path/to/train.txt \
  --savedir ./checkpoints \
  --gpu

When a translation language model is trained, you must give a parallel corpus (.tsv) to the --train option.

python train.py \
  --task translation \
  --train /path/to/train.tsv \
  --savedir ./checkpoints \
  --gpu

References