- Our released RRS corpus and Crawled Douban Nonparalell corpus can be found here.
- Our released BERT-FP post-training checkpoint for the RRS corpus can be found here.
- Our post-training and fine-tuning checkpoints on Ubuntu, Douban, E-commerce, and our released RRS datasets are released here. Feel free to reproduce the experimental results in the paper.
-
init the repo
Before using the repo, please run the following command to init:
# create the necessay folders python init.py # prepare the environment pip install -r requirements.txt
-
train the model
The necessary details can be found under the
config
folder.# dataset_name: douban, ecommerce, ubuntu, restoration-200k # model_name: dual-bert(DR-BERT), bert-ft, sa-bert, bert-fp(post-training), poly-encoder ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
-
test the model
./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>