/awesome-keyphrase-papers

Papers about keyphrase generation and extraction

MIT LicenseMIT

awesome-keyphrase-papers

Papers about keyphrase generation and extraction

Di Wu and Kai-Wei Chang

Background

Keyphrases are the phrases that identify the most salient concepts in a document. Keyphrase extraction and keyphrase generation are fundamental tasks connected with numerous NLP and IR applications. In this repo, we summarize influential keyphrase-related papers and resources. Papers are selected from *ACL, EMNLP, AAAI, SIGIR, and other related conferences. If you have any suggestions or would like to add some papers, please submit an issue or a pull request. Your contribution is much appreciated! KeyphraseExtractionSurvey and Awesome-Keyphrase-Prediction are two other great repositories. Also check them out if you are interested.

Contents

Paper List

Keyphrase Generation Methods

  1. Deep Keyphrase Generation (Meng et al., ACL 2017)
  2. Keyphrase Generation with Correlation Constraints (Chen et al., EMNLP 2018)
  3. Semi-Supervised Learning for Neural Keyphrase Generation (Ye & Wang, EMNLP 2018)
  4. Title-Guided Encoding for Keyphrase Generation (Chen et al., AAAI 2019)
  5. An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction (Chen et al., NAACL 2019)
  6. Incorporating Linguistic Constraints into Keyphrase Generation (Zhao & Zhang, ACL 2019)
  7. Topic-Aware Neural Keyphrase Generation for Social Media Language (Wang et al., ACL 2019)
  8. Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards (Chan et al., ACL 2019)
  9. Diverse Keyphrase Generation with Neural Unlikelihood Training (Bahuleyan & El Asri, COLING 2020)
  10. One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases (Yuan et al., ACL 2020)
  11. Exclusive Hierarchical Decoding for Deep Keyphrase Generation (Chen et al., ACL 2020)
  12. A Preliminary Exploration of GANs for Keyphrase Generation (Swaminathan et al., EMNLP 2020)、
  13. Keyphrase Generation with GANs in Low-Resources Scenarios (Lancioni et al., sustainlp 2020)
  14. SGG: Learning to Select, Guide, and Generate for Keyphrase Generation (Zhao et al., NAACL 2021)
  15. UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction (Wu et al., Findings 2021)
  16. One2Set: Generating Diverse Keyphrases as a Set (Ye et al., ACL-IJCNLP 2021)
  17. Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention (Ahmad et al., ACL-IJCNLP 2021)
  18. Keyphrase Generation with Fine-Grained Evaluation-Guided Reinforcement Learning (Luo et al., Findings 2021)
  19. Heterogeneous Graph Neural Networks for Keyphrase Generation (Ye et al., EMNLP 2021)
  20. Structure-Augmented Keyphrase Generation (Kim et al., EMNLP 2021)
  21. HTKG: Deep Keyphrase Generation with Neural Hierarchical Topic Guidance (Zhang et al., SIGIR 2022)
  22. Fast and Constrained Absent Keyphrase Generation by Prompt-Based Learning (Wu et al., AAAI 2022)
  23. Unsupervised Deep Keyphrase Generation (Shen et al., AAAI 2022)
  24. Automatic Keyphrase Generation by Incorporating Dual Copy Mechanisms in Sequence-to-Sequence Learning (Wang et al., COLING 2022)
  25. Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training (Gao et al., Findings 2022)
  26. Learning Rich Representation of Keyphrases from Text (Kulkarni et al., Findings 2022)
  27. WR-One2Set: Towards Well-Calibrated Keyphrase Generation (Xie et al., EMNLP 2022)
  28. Keyphrase Generation via Soft and Hard Semantic Corrections (Zhao et al., EMNLP 2022)
  29. Representation Learning for Resource-Constrained Keyphrase Generation (Wu et al., Findings 2022)
  30. KPDROP: Improving Absent Keyphrase Generation (Ray Chowdhury et al., Findings 2022)
  31. Keyphrase Generation Beyond the Boundaries of Title and Abstract (Garg et al., Findings 2022)
  32. Unsupervised Open-domain Keyphrase Generation (Do et al., ACL 2023)
  33. Data Augmentation for Low-Resource Keyphrase Generation (Garg et al., Findings 2023)
  34. General-to-Specific Transfer Labeling for Domain Adaptable Keyphrase Generation (Meng et al., Findings 2023)
  35. Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models (Wu et al., EMNLP 2023)
  36. SimCKP: Simple Contrastive Learning of Keyphrase Representations (Choi et al., Findings 2023)
  37. Improving Low-Resource Keyphrase Generation through Unsupervised Title Phrase Generation (Kang & Shin, LREC-COLING 2024)
  38. Keyphrase Generation: Lessons from a Reproducibility Study (Thomas & Vajjala, LREC-COLING 2024)
  39. On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation (Wu et al., LREC-COLING 2024)
  40. Improving Absent Keyphrase Generation with Diversity Heads (Thomas & Vajjala, Findings 2024)

Keyphrase Extraction Methods

  1. A statistical learning approach to automatic indexing of controlled index terms (Leung and Kan, JASIS 1997)
  2. KEA: Practical Automatic Keyphrase Extraction (Witten et al., DL 1999)
  3. A Language Model Approach to Keyphrase Extraction (Tomokiyo & Hurst, MWE 2003)
  4. TextRank: Bringing Order into Text (Mihalcea & Tarau, EMNLP 2004)
  5. CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction (Wan & Xiao, COLING 2008)
  6. A ranking approach to keyphrase extraction (Jiang et al., SIGIR 2009)
  7. Automatic Keyphrase Extraction via Topic Decomposition (Liu et al., EMNLP 2010)
  8. KP-Miner: Participation in SemEval-2 (El-Beltagy & Rafea, SemEval 2010)
  9. TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction (Bougouin et al., IJCNLP 2013)
  10. Diverse Keyword Extraction from Conversations (Habibi & Popescu-Belis, ACL 2013)
  11. Single Document Keyphrase Extraction Using Label Information (Negi, COLING 2014)
  12. Extracting Discriminative Keyphrases with Learned Semantic Hierarchies (Wang et al., COLING 2016)
  13. Keyphrase Annotation with Graph Co-Ranking (Bougouin et al., COLING 2016)
  14. A Graph Degeneracy-based Approach to Keyword Extraction (Tixier et al., EMNLP 2016)
  15. Supervised Keyphrase Extraction as Positive Unlabeled Learning (Sterckx et al., EMNLP 2016)
  16. Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter (Zhang et al., EMNLP 2016)
  17. Incorporating Expert Knowledge into Keyphrase Extraction (Gollapalli et al., AAAI 2017)
  18. Salience Rank: Efficient Keyphrase Extraction with Topic Modeling (Teneva & Cheng, ACL 2017)
  19. Multi-Task Learning of Keyphrase Boundary Classification (Augenstein & Søgaard, ACL 2017)
  20. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents (Florescu & Caragea, ACL 2017)
  21. Learning Feature Representations for Keyphrase Extraction (Florescu and Jin, AAAI 2018)
  22. Simple Unsupervised Keyphrase Extraction using Sentence Embeddings (Bennani-Smires et al., CoNLL 2018)
  23. Yake! collection-independent automatic keyword extractor (Campos et al., ECIR 2018)
  24. Unsupervised Keyphrase Extraction with Multipartite Graphs (Boudin, NAACL 2018)
  25. Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings (Mahata et al., NAACL 2018)
  26. DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases (Zhang et al., SIGIR 2019)
  27. Glocal: Incorporating Global Information in Local Convolution for Keyphrase Extraction (Prasad & Kan, NAACL 2019)
  28. Open Domain Web Keyphrase Extraction Beyond Language Modeling (Xiong et al., EMNLP-IJCNLP 2019)
  29. Using Human Attention to Extract Keyphrase from Microblog Post (Zhang & Zhang, ACL 2019)
  30. SaSAKE: Syntax and Semantics Aware Keyphrase Extraction from Research Papers (Santosh et al., COLING 2020)
  31. KeyGames: A Game Theoretic Approach to Automatic Keyphrase Extraction (Saxena et al., COLING 2020)
  32. A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents (Lai et al., COLING 2020)
  33. SIFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. (Sun et al., IEEE Access 2020)
  34. Web Document Encoding for Structure-Aware Keyphrase Extraction (Kim et al., SIGIR 2021)
  35. Keyphrase Extraction from Scientific Articles via Extractive Summarization (Kontoulis et al., sdp 2021)
  36. Word centrality constrained representation for keyphrase extraction (Gero & Ho, BioNLP 2021)
  37. Exploiting Position and Contextual Word Embeddings for Keyphrase Extraction from Scientific Papers (Patel & Caragea, EACL 2021)
  38. Keyphrase Extraction with Incomplete Annotated Training Data (Lei et al., WNUT 2021)
  39. Importance Estimation from Multiple Perspectives for Keyphrase Extraction (Song et al., EMNLP 2021)
  40. Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context (Liang et al., EMNLP 2021)
  41. AttentionRank: Unsupervised Keyphrase Extraction using Self and Cross Attentions (Ding & Luo, EMNLP 2021)
  42. UCPhrase: Unsupervised Context-aware Quality Phrase Tagging (Gu et al., KDD 2021)
  43. AGRank: Augmented Graph-based Unsupervised Keyphrase Extraction (Ding & Luo, AACL-IJCNLP 2022)
  44. Hyperbolic Relevance Matching for Neural Keyphrase Extraction (Song et al., NAACL 2022)
  45. MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction (Zhang et al., Findings 2022)
  46. Unsupervised Keyphrase Extraction via Interpretable Neural Networks (Joshi et al., Findings 2023)
  47. PromptRank: Unsupervised Keyphrase Extraction Using Prompt (Kong et al., ACL 2023)
  48. Improving Embedding-based Unsupervised Keyphrase Extraction by Incorporating Structural Information (Song et al., Findings 2023)
  49. Unsupervised Keyphrase Extraction by Learning Neural Keyphrase Set Function (Song et al., Findings 2023)
  50. Multi-Task Knowledge Distillation with Embedding Constraints for Scholarly Keyphrase Boundary Classification (Park & Caragea, EMNLP 2023)
  51. SAMRank: Unsupervised Keyphrase Extraction using Self-Attention Map in BERT and GPT-2 (Kang & Shin, EMNLP 2023)
  52. HyperRank: Hyperbolic Ranking Model for Unsupervised Keyphrase Extraction (Song et al., EMNLP 2023)
  53. Mitigating Over-Generation for Unsupervised Keyphrase Extraction with Heterogeneous Centrality Detection (Song et al., EMNLP 2023)
  54. Clustering-based Sampling for Few-Shot Cross-Domain Keyphrase Extraction (Mishra et al., Findings 2024)
  55. Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction (Luo et al., LREC-COLING 2024)
  56. Match More, Extract Better! Hybrid Matching Model for Open Domain Web Keyphrase Extraction (Song et al., Findings 2024)

Evaluation/Empirical Studies/Position

  1. TermITH-Eval : a French Standard-Based Resource for Keyphrase Extraction Evaluation (Bougouin, LREC 2016)
  2. Human-competitive tagging using automatic keyphrase extraction (Medelyan et al., EMNLP 2009)
  3. Approximate Matching for Evaluating Keyphrase Extraction (Zesch & Gurevych, RANLP 2009)
  4. Evaluating N-gram based Evaluation Metrics for Automatic Keyphrase Extraction (Kim et al., COLING 2010)
  5. How Document Pre-processing affects Keyphrase Extraction Performance (Boudin et al., WNUT 2016)
  6. Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction (Basaldella et al., COLING 2016)
  7. Creation and evaluation of large keyphrase extraction collections with multiple opinions (Sterckx et al., Language Resources and Evaluation 2018)
  8. Encoding Conversation Context for Neural Keyphrase Extraction from Microblog Posts (Zhang et al., NAACL 2018)
  9. Keyphrase Generation: A Text Summarization Struggle (Çano & Bojar, NAACL 2019)
  10. Understanding the Tradeoff between Cost and Quality of Expert Annotations for Keyphrase Extraction (Chau et al., LAW 2020)
  11. Large-Scale Evaluation of Keyphrase Extraction Models (Gallina et al., JCDL 2020)
  12. Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning (Park & Caragea, COLING 2020)
  13. An Empirical Study on Neural Keyphrase Generation (Meng et al., NAACL 2021)
  14. Redefining Absent Keyphrases and their Effect on Retrieval Effectiveness (Boudin & Gallina, NAACL 2021)
  15. KPEval: Towards Fine-Grained Semantic-Based Keyphrase Evaluation (Wu et al., Findings 2024)

Surveys

  1. Automatic Keyphrase Extraction: A Survey of the State of the Art (Hasan & Ng, ACL 2014)
  2. Keyword and Keyphrase Extraction Techniques: A Literature Review (Siddiqi and Sharan, IJCA 2015)
  3. Keyphrase Generation: A Multi-Aspect Survey (Cano and Bojar, IEEE FRUCT 2019)
  4. Automatic keyphrase extraction: a survey and trends (Merrouni et al., JIIS 2020)
  5. A Survey on Recent Advances in Keyphrase Extraction from Pre-trained Language Models (Song et al., Findings 2023)
  6. From statistical methods to deep learning, automatic keyphrase prediction: A survey (Xie et. al., Information Processing & Management 2023)

Applications

  1. Automatic phrase indexing for document retrieval (Fagan, SIGIR 1987)
  2. A Just-In-Time Keyword Extraction from Meeting Transcripts (Song et al., NAACL 2013)
  3. Keyphrase Extraction for N-best Reranking in Multi-Sentence Compression (Boudin & Morin, NAACL 2013)
  4. Joint Learning of Chinese Words, Terms and Keywords (Cao et al., EMNLP 2014)
  5. Financial Keyword Expansion via Continuous Word Vector Representations (Tsai & Wang, EMNLP 2014)
  6. Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach (Caragea et al., EMNLP 2014)
  7. Cross-Lingual Information to the Rescue in Keyword Extraction (Huang et al., ACL 2014)
  8. Automatic Keyword Extraction on Twitter (Marujo et al., ACL-IJCNLP 2015)
  9. Extraction of Keywords of Novelties From Patent Claims (Suzuki & Takatsuka, COLING 2016)
  10. Real-Time Keyword Extraction from Conversations (Meladianos et al., EACL 2017)
  11. Keyphrases Extraction from User-Generated Contents in Healthcare Domain Using Long Short-Term Memory Networks (Saputra et al., BioNLP 2018)
  12. Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training (Karamanolakis et al., EMNLP-IJCNLP 2019)
  13. A Human-AI Loop Approach for Joint Keyword Discovery and Expectation Estimation in Micropost Event Detection (Bhardwaj et al., AAAI 2020)
  14. Understanding Medical Conversations with Scattered Keyword Attention and Weak Supervision from Responses (Shi et al., AAAI 2020)
  15. Keywords-Guided Abstractive Sentence Summarization (Li et al., AAAI 2020)
  16. Keyphrase Generation for Scientific Document Retrieval (Boudin et al., ACL 2020)
  17. Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News Multi-Headline Generation (Liu et al., EMNLP 2020)
  18. Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings (Wang et al., EMNLP 2020)
  19. Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction (Wang et al., EMNLP 2020)
  20. Bayesian Critiquing with Keyphrase Activation Vectors for VAE-based Recommender Systems (Yang et al., NAACL 2021)
  21. KPQA: A Metric for Generative Question Answering Using Keyphrase Weights (Lee et al., NAACL 2021)
  22. Arabic Keyphrase Extraction: Enhancing Deep Learning Models with Pre-trained Contextual Embedding and External Features (Alharbi & Al-Muhtasab, WANLP 2022)
  23. Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering (ACM MM 2023)

Data/Resources

  1. Please also check out KeyphraseExtractionSurvey for early keyphrase extraction datasets.
  2. KP20k, Inspec, Krapivin, NUS, SemEval (Meng et al., ACL 2017)
  3. KPTimes (Gallina et al., INLG 2019)
  4. OpenKP (Xiong et al., EMNLP 2019)
  5. OAGK (Cano et al., NAACL 2019)
  6. StackEx (Yuan et al., ACL 2020)
  7. KPBiomed (Houbre et al., Louhi 2022)
  8. Keyphrase Prediction from Video Transcripts: New Dataset and Directions (Veyseh et al., COLING 2022)
  9. EcommerceMKP (Gao et al., Findings 2022)
  10. AcademicMKP (Gao et al., Findings 2022)
  11. LipKey (Koto et al., COLING 2022)
  12. LDKP (Mahata et al., arXiv 2022)
  13. A new dataset for multilingual keyphrase generation (Piedboeuf and Langlais, NeurIPS 2022 Datasets and Benchmarks track)
  14. Few-TK: A Dataset for Few-shot Scientific Typed Keyphrase Recognition (Lahiri et al., Findings 2024)
  15. EUROPA: A Legal Multilingual Keyphrase Generation Dataset (Salaün et al., ACL 2024)

Tools

  1. DKPro Keyphrases: Flexible and Reusable Keyphrase Extraction Experiments (Erbs et al., ACL 2014)
  2. pke: an open source python-based keyphrase extraction toolkit (Boudin, COLING 2016)
  3. OpenNMT-kpg
  4. keyphrase-generation-rl
  5. ir_using_kg
  6. multilingual_keyphrase_generation
  7. dlkp
  8. DeepKPG
  9. KPEval

Conference/Workshop/Tutorials