Custom stop words
Opened this issue · 2 comments
Is it possible to customize the stop words used, so I can provide a different list other than the default one or disable stop words?
Context: I'm setting up Haystack for the search in https://rocketvalidator.com/html-validation - currently it just uses a simple search by substring but I want to use Haystack instead. So far it's going great!
During the integration, I found that the results were not as expected in many searches, and it looks like it was due because most of the titles include characters like double quotes:
So when I searched for something containing double quotes, these guides would appear first as they scored higher because they have many double quotes.
I guess this could be solved by adding the double quotes (and other characters like parenthesis, brackets, <
and >
, etc.) to the stop words. My workaround was to clean up the strings, both during the load and the search:
defp cleanup(str) do
str
|> String.replace(["“", "”", "<", ">", "(", ")", ".", ",", ";", ":"], "")
|> String.trim()
end
After that, I found that a search for must not appear
like this https://rocketvalidator.com/html-validation?search=must+not+appear provided no results using Haystack, and that's because these are all stop words.
Finally, nor non-English content it would be great to be able to customize the stop words.
Hey @jaimeiniesta,
Yeah, you can pass a custom list of transformer modules when adding a field:
haystack/lib/haystack/index/field.ex
Line 46 in 55e8b1f
So you could either pass your own implementation of stop words, or remove it completely. And you can do that on a per-field basis.
Again this needs to be added to the documentation 😅
Ah, that's cool then. I'll wait for that documentation. Thanks! 😎