`pke` - python keyphrase extraction

pke is an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extended to develop new models. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction models, and ships with supervised models trained on the SemEval-2010 dataset.

Installation
Minimal example
Getting started
Implemented models
Citing pke

Installation

To pip install pke from github:

pip install git+https://github.com/boudinfl/pke.git

pke also requires external resources that can be obtained using:

python -m nltk.downloader stopwords
python -m nltk.downloader universal_tagset
python -m spacy download en # download the english model

As of April 2019, pke only supports Python 3.6+.

Minimal example

pke provides a standardized API for extracting keyphrases from a document. Start by typing the 5 lines below. For using another model, simply replace pke.unsupervised.TopicRank with another model (list of implemented models).

import pke

# initialize keyphrase extraction model, here TopicRank
extractor = pke.unsupervised.TopicRank()

# load the content of the document, here document is expected to be in raw
# format (i.e. a simple text file) and preprocessing is carried out using spacy
extractor.load_document(input='/path/to/input.txt', language='en')

# keyphrase candidate selection, in the case of TopicRank: sequences of nouns
# and adjectives (i.e. `(Noun|Adj)*`)
extractor.candidate_selection()

# candidate weighting, in the case of TopicRank: using a random walk algorithm
extractor.candidate_weighting()

# N-best selection, keyphrases contains the 10 highest scored candidates as
# (keyphrase, score) tuples
keyphrases = extractor.get_n_best(n=10)

A detailed example is provided in the examples/ directory.

Getting started

Tutorials and code documentation are available at https://boudinfl.github.io/pke/.

Implemented models

pke currently implements the following keyphrase extraction models:

Unsupervised models
- Statistical models
  - TfIdf [documentation]
  - KPMiner [documentation, article by (El-Beltagy and Rafea, 2010)]
  - YAKE [documentation, article by (Campos et al., 2020)]
- Graph-based models
  - TextRank [documentation, article by (Mihalcea and Tarau, 2004)]
  - SingleRank [documentation, article by (Wan and Xiao, 2008)]
  - TopicRank [documentation, article by (Bougouin et al., 2013)]
  - TopicalPageRank [documentation, article by (Sterckx et al., 2015)]
  - PositionRank [documentation, article by (Florescu and Caragea, 2017)]
  - MultipartiteRank [documentation, article by (Boudin, 2018)]
Supervised models
- Feature-based models
  - Kea [documentation, article by (Witten et al., 2005)]
  - WINGNUS [documentation, article by (Nguyen and Luong, 2010)]

Citing pke

If you use pke, please cite the following paper:

@InProceedings{boudin:2016:COLINGDEMO,
  author    = {Boudin, Florian},
  title     = {pke: an open source python-based keyphrase extraction toolkit},
  booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},
  month     = {December},
  year      = {2016},
  address   = {Osaka, Japan},
  pages     = {69--73},
  url       = {http://aclweb.org/anthology/C16-2015}
}

SunSain/pke

`pke` - python keyphrase extraction

Table of Contents

Installation

Minimal example

Getting started

Implemented models

Citing pke

SunSain/pke

pke - python keyphrase extraction

Table of Contents

Installation

Minimal example

Getting started

Implemented models

Citing pke

`pke` - python keyphrase extraction