A pure JS implementation of the Rapid Automated Keyword Extraction (RAKE) algorithm. Put in any text corpus, get back a bunch of keyphrases and keywords.
- english
- german
- spanish
- italian
- dutch
- portugese
- swedish
More languages are fairly easy to add, see the stoplist module for details.
Without any further options:
import rake from 'rake-js'
const myKeywords = rake(someTextContent) // ['keyword1, ...]
When the language is known in advance (faster execution):
import rake from 'rake-js'
const myKeywords = rake(someTextContent, { language: 'english' })
When the corpus is divided by something other than whitespace (eg: ;
):
import rake from 'rake-js'
const myKeywords = rake(someTextContent, { delimiters: [';+'] })
This algorithm is fast, compared with other approaches like TextRank. The results are surprisingly good for a cross-language algorithm, and the truly relevant keywords / phrases are included in the result in most cases. For more details about the RAKE algorithm, read the original paper.
There are still rough edges in the code, but I tried to translate the abstract algorithm into a solid software package, tested and typesafe. Actually I wrote this thing because I was very disappointed with all the existing solutions on NPM, and I hope this repository is easier to contribute to in the future.
- support more languages (only handful are whitelisted for now)
- duplicate keyword filtering
- check browser compatibility
LGPL-3.0.
You can use this package in all your free or commercial products without any issues, but I want bugfixes and improvements to this algorithm to flow back into the public code repository.