Code for our EACL-2021 paper "Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs".
If you find that the code is useful in your research, please consider citing our paper.
@inproceedings{Huang2021synpg,
author = {Kuan-Hao Huang and
Kai-Wei Chang},
title = {Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs},
booktitle = {Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
year = {2021},
}
- Python=3.7.10
$ pip install -r requirements.txt
- Download pretrained SynPG and pretrained parse generator, and put them to
./model
- Run
demo.sh
or the following command to generatedemo/output.txt
python generate.py \
--synpg_model_path ./model/pretrained_synpg.pt \
--pg_model_path ./model/pretrained_parse_generator.pt \
--input_path ./demo/input.txt \
--output_path ./demo/output.txt \
--bpe_codes_path ./data/bpe.codes \
--bpe_vocab_path ./data/vocab.txt \
--bpe_vocab_thresh 50 \
--dictionary_path ./data/dictionary.pkl \
--max_sent_len 40 \
--max_tmpl_len 100 \
--max_synt_len 160 \
--temp 0.5 \
--seed 0
- Download data and put them under
./data/
- Download glove.840B.300d.txt and put it under
./data/
- Run
train_synpg.sh
or the following command to train SynPG
python train_synpg.py \
--model_dir ./model \
--output_dir ./output \
--bpe_codes_path ./data/bpe.codes \
--bpe_vocab_path ./data/vocab.txt \
--bpe_vocab_thresh 50 \
--dictionary_path ./data/dictionary.pkl \
--train_data_path ./data/train_data.h5 \
--valid_data_path ./data/valid_data.h5 \
--emb_path ./data/glove.840B.300d.txt \
--max_sent_len 40 \
--max_synt_len 160 \
--word_dropout 0.4 \
--n_epoch 5 \
--batch_size 64 \
--lr 1e-4 \
--weight_decay 1e-5 \
--log_interval 250 \
--gen_interval 5000 \
--save_interval 10000 \
--temp 0.5 \
--seed 0
- Run
train_parse_generator.sh
or the following command to train the parse generator
python train_parse_generator.py \
--model_dir ./model \
--output_dir ./output_pg \
--dictionary_path ./data/dictionary.pkl \
--train_data_path ./data/train_data.h5 \
--valid_data_path ./data/valid_data.h5 \
--max_sent_len 40 \
--max_tmpl_len 100 \
--max_synt_len 160 \
--word_dropout 0.2 \
--n_epoch 5 \
--batch_size 32 \
--lr 1e-4 \
--weight_decay 1e-5 \
--log_interval 250 \
--gen_interval 5000 \
--save_interval 10000 \
--temp 0.5 \
--seed 0
Kuan-Hao Huang / @ej0cl6