/ViPG

Primary LanguagePythonMIT LicenseMIT

ViPG

Code for paper Visual Information Guided Zero-Shot Paraphrase Generation by Zhe Lin, Xiaojun Wan. This paper is accepted by Findings of COLING'22. Please contact me at linzhe@pku.edu.cn for any question. overview

Dependencies

pytorch 1.4
StanfordCoreNLP
Pillow
numpy
Transformer
subword-nmt

System Output

If you are looking for system output and don't bother to install dependencies and train a model (or run a pre-trained model), the all-system-output folder is for you.

Train a new model

Step 1: Preprocess

Image

You should first use gerate_img_feature.py to generate image feature vector. We note the output file of image feature vector as image_feature

Text You should first leverage POS.py to transform a text to the following format, and you will get the text file *.pos.

overview

Then you can use the following command to apply byte pair encoding to word segmentation:

subword-nmt learn-bpe -s 32000 < *.pos > *.pos.code
subword-nmt apply-bpe -c *.pos.code < *.pos > *.pos.bpe

Next, you should generate vocabulary of corpora as follow:

python createVocab.py --file *.pos.bpe\
                      --lower --save_path ./vocab.share  --min_freq 1

Finally, you can employ the folloing command to generate the training data file:

python preprocess.py --sent_file *.pos.bpe \
                     --img_file image_file.img \
                     --vocab vocab.share \
                     --save_file train.data

image_file.img is a text file, each line of it is an image file name which orresponds to the training file .pos.bpe.

There are two examples of preprocessing dataset in ./data/flickr and ./data/mscoco

Step 2: Train Model

python train.py --cuda_num 2\
                --share_embed \
                --vocab ./vocab.share \
                --file ./train.data\
                --img_path ./image_feature/ \
                --checkpoint_path ./model \
                --checkpoint_n_epoch 1 \
                --grad_accum 1 \
                --max_tokens 5000 \
                --max_batch_size 256 \
                --discard_invalid_data

Generation Paraphrase

python generator.py --cuda_num 1 \
                 --raw_file *.pos.bpe \
                 --ref_file *.test # we calculate self-bleu, so reference file is itself \
                 --max_tokens 300 \
                 --vocab ./vocab.share \
                 --decode_method beam \
                 --beam 10 \
                 --model_path ./model/checkpoint.pkl \
                 --output_path ./output \
                 --max_add_token 50 \
                 --max_alpha 1

Pre-trained Models

We release our pretrained model at here.

Results

result

Citation

If you use any content of this repo for your work, please cite the following bib entry:

@inproceedings{lin-wan-2022-visual,
    title = "Visual Information Guided Zero-Shot Paraphrase Generation",
    author = "Lin, Zhe  and
      Wan, Xiaojun",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2022.coling-1.568",
    pages = "6530--6539",
    abstract = "Zero-shot paraphrase generation has drawn much attention as the large-scale high-quality paraphrase corpus is limited. Back-translation, also known as the pivot-based method, is typical to this end. Several works leverage different information as {''}pivot{''} such as language, semantic representation and so on. In this paper, we explore using visual information such as image as the {''}pivot{''} of back-translation. Different with the pipeline back-translation method, we propose visual information guided zero-shot paraphrase generation (ViPG) based only on paired image-caption data. It jointly trains an image captioning model and a paraphrasing model and leverage the image captioning model to guide the training of the paraphrasing model. Both automatic evaluation and human evaluation show our model can generate paraphrase with good relevancy, fluency and diversity, and image is a promising kind of pivot for zero-shot paraphrase generation.",
}