
Extract Keywords from sentence or Replace keywords in sentences.

Primary LanguagePythonMIT LicenseMIT


Documentation Status license

This module can be used to replace keywords in sentences or extract keywords from sentences.


$ pip install flashtext


Extract keywords
>>> from flashtext.keyword import KeywordProcessor
>>> keyword_processor = KeywordProcessor()
>>> keyword_processor.add_keyword('Big Apple', 'New York')
>>> keyword_processor.add_keyword('Bay Area')
>>> keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')
>>> keywords_found
>>> ['New York', 'Bay Area']
Replace keywords
>>> keyword_processor.add_keyword('New Delhi', 'NCR region')
>>> new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.')
>>> new_sentence
>>> 'I love New York and NCR region.'
Case Sensitive example
>>> from flashtext.keyword import KeywordProcessor
>>> keyword_processor = KeywordProcessor(case_sensitive=True)
>>> keyword_processor.add_keyword('Big Apple', 'New York')
>>> keyword_processor.add_keyword('Bay Area')
>>> keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.')
>>> keywords_found
>>> ['Bay Area']
No clean name for Keywords
>>> from flashtext.keyword import KeywordProcessor
>>> keyword_processor = KeywordProcessor()
>>> keyword_processor.add_keyword('Big Apple')
>>> keyword_processor.add_keyword('Bay Area')
>>> keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.')
>>> keywords_found
>>> ['Big Apple', 'Bay Area']

API doc

Documentation can be found at FlashText Read the Docs.


$ git clone https://github.com/vi3k6i5/flashtext
$ cd flashtext
$ pip install pytest
$ python setup.py test

Why not Regex?

It's a custom algorithm based on Aho-Corasick algorithm and Trie Dictionary.

To do the same with regex it will take a lot of time:

Docs count # Keywords : Regex flashtext
1.5 million 2K : 16 hours Not measured
2.5 million 10K : 15 days 15 mins

The idea for this library came from the following StackOverflow question.



The project is licensed under the MIT license.