
Keyphrase Generation

Primary LanguagePythonMIT LicenseMIT

Keyphrase Generation (built on OpenNMT-py)

This is a repository providing code and datasets used in One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases and Does Order Matter? An Empirical Study on Generating Multiple Keyphrases as a Sequence.

All datasets and checkpoints used in the papers can be downloaded here. Unzip the file ckpts&data.zip and override the original data/ and models/ folders. Note that the data points in KP20k have been manually cleaned.


All the config files used for training and evaluation can be found in folder config/. For more examples, you can refer to scripts placed in folder script/.

Preprocess the data

python -config config/preprocess/config-preprocess-keyphrase-kp20k.yml

Train a One2Seq model with Diversity Mechanisms enabled

python train.py -config config/train/config-rnn-keyphrase-one2seq-diverse.yml

Train a One2One model

python train.py -config config/train/config-rnn-keyphrase-one2one-stackexchange.yml

Run generation and evaluation

python kp_gen_eval.py -tasks pred eval report -config config/test/config-test-keyphrase-one2seq.yml -data_dir data/keyphrase/meng17/ -ckpt_dir models/keyphrase/meng17-one2seq-kp20k-topmodels/ -output_dir output/meng17-one2seq-topbeam-selfterminating/meng17-one2many-beam10-maxlen40/ -testsets duc inspec semeval krapivin nus -gpu -1 --verbose --beam_size 10 --batch_size 32 --max_length 40 --onepass --beam_terminate topbeam --eval_topbeam


Major contributors are: Rui Meng (University of Pittsburgh) Eric Yuan (Microsoft Research, Montréal) Tong Wang (Microsoft Research, Montréal) Khushboo Thaker (University of Pittsburgh)


Please cite the following papers if you are interested in using our code and datasets.

  title={One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases},
  author={Yuan, Xingdi and Wang, Tong and Meng, Rui and Thaker, Khushboo and He, Daqing and Trischler, Adam},
  journal={arXiv preprint arXiv:1810.05241},
  title={Does Order Matter? An Empirical Study on Generating Multiple Keyphrases as a Sequence},
  author={Meng, Rui and Yuan, Xingdi and Wang, Tong and Brusilovsky, Peter and Trischler, Adam and He, Daqing},
  journal={arXiv preprint arXiv:1909.03590},
  title={Deep keyphrase generation},
  author={Meng, Rui and Zhao, Sanqiang and Han, Shuguang and He, Daqing and Brusilovsky, Peter and Chi, Yu},
  booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},