An implementation of Cross-lingual Language Model Pretraining (XLM) using pytorch. You can choose following three training models.
- Causal language model (
-—task causal
) - Masked language model (
-—task masked
) - Translation language model (
-—task translation
)
This code are depend on the following.
- python==3.6.5
- pytorch==1.1.0
- torchtext==0.3.1
git clone https://github.com/t080/pytorch-xlm.git
cd ./pytorch-xlm
pip install -r requirements.txt
When a causal language model or a masked language model are trained, you must give a monolingual corpus (.txt) to the --train
option.
python train.py \
--task causal (or masked) \
--train /path/to/train.txt \
--savedir ./checkpoints \
--gpu
When a translation language model is trained, you must give a parallel corpus (.tsv) to the --train
option.
python train.py \
--task translation \
--train /path/to/train.tsv \
--savedir ./checkpoints \
--gpu