boudinfl/pke

Adding Flashtext to the library?

BradKML opened this issue · 3 comments

ygorg commented

Hi, PKE aims at implementing different keyphrase extraction techniques that were proposed by the scientific community, in order to compare them and make them available.
This project is not (as of right now) meant to be an overly optimised for speed solution to keyphrase extraction (as flashtext seem to be).

If you want a similar approach to flashtext with pke you can use longest_keyword_sequence_selection:

extractor = pke.FirstPhrases()
extractor.load_document(mydoc)
extractor.longest_keyword_sequence_selection(list_of_words)
extractor.candidate_weighting()
extractor.get_n_best()

Though I'm not sure I understand the motivation as to why you suggest adding flashtext to pke.

I am mainly looking through other models that might exist out there that are not included in PKE for comparison and utility.

ygorg commented

I understand. But flashtext differs greatly from the methods implemented in pke.
The definition of keyword is different. In flashtext "keywords" are search patterns, in pke they represent the concepts of the document.

Flashtext is a string searching algorithm (so the "keywords" must be known in advance).
In pke the goal is to identify keyphrases (so they are not known in advance) that are close to reference keyphrases (oversimplification).

I don't think adding flashtext to pke is relevant. Their goal are different.