Running RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models on RoBERTa model.
This repository source is cloned from @hieudx149 X-RetroMAE repository.
X-DupMAE tries to modify RetroMAE v2 to be compatible with RoBERTa and XLM-RoBERTa, hope this project will help anyone who wants to apply RetroMAE v2 to their own language rather than English.
Compare to hieudx149 version:
- Copy RetroMAE v2 modeling_duplex.py and change all Bert* to Roberta*
- Copy DupMAECollator class to data.py
- Copy code to switch between
retromae
anddupmae
in run.py
pip install --upgrade pip
pip install -r requirements.txt
First make sure that you have preprocessed your own data first by running the preprocessing.py in examples/pretrain, then run:
sh src/run_pretrain.sh
@inproceedings{RetroMAE,
title={RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder},
author={Shitao Xiao, Zheng Liu, Yingxia Shao, Zhao Cao},
url={https://arxiv.org/abs/2205.12035},
booktitle ={EMNLP},
year={2022},
}