Bergvca/string_grouper

Question / suggestion to use multiple n-grams to get more features

Opened this issue · 0 comments

Hi @Bergvca and @ParticularMiner,

Hope you are doing good.

I got to work on the same project again and have a question / suggestion - would it be possible to use multiple n-grams to get more features? Like currently we have the following - ngram_size: The amount of characters in each n-gram. Default is 3.

What if we get n-grams in a list like [2,3,4] and get more vector components - ngrams=2 plus ngrams=3 and ngrams=4?

What do you think?

By the way, the string_grouper approach is really good in terms of speed and efficiency. Great work!

Thank you,
iibarant