keyphrase-extraction-as-sequence-labeling-data

Dataset for the paper entitled, Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings. This repository contains only the dataset used in the experiments and its overview.

For more details on how the dataset was created, and the models trained on it, please refer to our paper, Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings

We ran our experiments on three different publicly available keyphrase extraction datasets after transforming them into a format suitable for sequence labeling tasks. The formatted datasets are:

Inspec (Hulth 2003),
SemEval-2010 (Kim et al. 2010) (referred to as SE- 2010),
SemEval-2017 (Augenstein et al. 2017) (referred to as SE-2017)

We didn't change the tagging scheme for SE-2017 as they were already in a suitable format. We used the following tagging scheme for the other two datasets (Inspec and SE-2010).

Where,

k_B -> B-KEY
k_I -> I-KEY
k_O -> O

General Dataset Stats

BiLSTM vs BiLSTM-CRF (F1-score)

Fine-tuning vs Pretrained (F1-score)

Embedding models comparison (F1-score)

References

Please cite [1] if you found the resources in this repository useful.

[1] Sahrawat, D., Mahata, D., Kulkarni, M., Zhang, H., Gosangi, R., Stent, A., Sharma, A., Kumar, Y., Shah, R.R., & Zimmermann, R. (2019). Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings. ArXiv, abs/1910.08840.

@article{Sahrawat2019KeyphraseEF,
  title={Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings},
  author={Dhruva Sahrawat and Debanjan Mahata and Mayank Kulkarni and Haimin Zhang and Rakesh Gosangi and Amanda Stent and Agniv Sharma and Yaman Kumar and Rajiv Ratn Shah and Roger Zimmermann},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.08840}
}

If you are using this resource in your experiments, then please also cite:

[2] Hulth, A. (2003). Improved Automatic Keyword Extraction Given More Linguistic Knowledge. EMNLP.

[3] Kim, S.N., Medelyan, O., Kan, M., & Baldwin, T. (2010). SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles. SemEval@ACL.