Suggestion: custom stemmers
Dissimilis opened this issue · 2 comments
Judging by the code this.stemmer = new PorterStemmer();
it looks like implementing and passing my own stemmer is impossible.
It should be trivial to make API changes allowing to assign custom stemmer in TokenizationOptions
. But maybe IStemmer would need more thoughts on the design.
P.S. this.stemmer = new PorterStemmer();
is a nice illustration of new is glue :)
Thanks for the suggestion! Yeah, at the moment only Porter stemming is supported - the IStemmer
interface is internal because it hasn't currently been designed with extensibility in mind.
You raise an interesting point though; there are other stemming algorithms, not least so that words from languages other than English can be stemmed effectively.
It's definitely something to think about...
Custom stemming will be available in v6