Figure out way to more quickly destem terms

Question

Figure out way to more quickly destem terms

Closed this issue 11 years ago · 1 comments

Large document set causes current destemming process to take a LONG time. There has to be a more efficient algorithm.

Answer 1 · 2013-12-23T21:10:13.000Z

Some ideas:
(1) search only until we find a version of the stemmed term that stands out (in other words, that accounts for a certain percentage of matches after we have reviewed a certain amount of our body of text).
*** I could take advantage of using list comprehension to find all matches in a given doc, instead of explicitly iterating over each term in the doc.
(2) find some module/library that is really fast for searching lists... I'm still searching.
(3) use threading -- I don't think this is a good idea.