/rakeR

an implementation of rapid automatic keyword extraction

Primary LanguageRGNU General Public License v2.0GPL-2.0

output
html_document pdf_document
default
default

rakeR

an implimentation of rapid automatic keyword extraction

This is an R implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley

At the document level, documents are tokenized using a list of stopwords. Each token is scored by counting the frequency of all co-occuring words (note that words co-occur with themselves for the purpose of this algorithm, this is referred to as degree) and dividing the by overall frequency of the words. Phrases are then scored by summing each tokens score. Additionally a function is provided to enrich an existing stop word list by looking across all the keywords for a group of documents.