/rake-nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

Primary LanguagePythonMIT LicenseMIT

rake-nltk

Build Status Licence Coverage Status

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.

Demo

Setup

Using pip

pip install rake-nltk

Directly from the repository

git clone https://github.com/csurfer/rake-nltk.git
python rake-nltk/setup.py install

Usage

from rake_nltk import Rake

r = Rake() # Uses stopwords for english from NLTK, and all puntuation characters.

# If you want to provide your own set of stop words and punctuations to
# r = Rake(<list of stopwords>, <string of puntuations to ignore>

r.extract_keywords_from_text(<text to process>)

r.get_ranked_phrases() # To get keyword phrases ranked highest to lowest.

References

This is a python implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley

Why I chose to implement it myself?

  • It is extremely fun to implement algorithms by reading papers. It is the digital equivalent of DIY kits.
  • There are some rather popular implementations out there, in python(aneesha/RAKE) and node(waseem18/node-rake) but neither seemed to use the power of NLTK. By making NLTK an integral part of the implementation I get the flexibility and power to extend it in other creative ways, if I see fit later, without having to implement everything myself.
  • I plan to use it in my other pet projects to come and wanted it to be modular and tunable and this way I have complete control.

Versions of python this code is tested against

  • 2.7
  • 3.4
  • 3.5
  • 3.6

Contributing

Bug Reports and Feature Requests

Please use issue tracker for reporting bugs or feature requests.

Development

Pull requests are most welcome.