Cannot match queries of single letter words
Closed this issue · 3 comments
I'm having issues with queries that contain single letter words such as "a" or "I". Specifically phrases such as "drinking a" or "I want". I pulled down the source to test if it was an issue in my usage or in the library and it appears to be in the library. To repro, open the benchmark test and replace the word "cheese" in the queries and documents to the word "a"
Is this a limitation of the presearchers? I am testing Luwak as a Lucene replacement an have not observed this before so I do think it is something outside of Lucene. If it is, I don't mind contributing code but I am not familiar enough to know where to start looking-could you point me in that direction?
Also notable, in my usages I am removing stop words so it's not an issue of having nothing to match on.
That'll be the StandardAnalyzer automatically removing English stopwords - I get bitten by this everytime I write a test with "a b c" as a document :)
That looks about like what was going on! I had a generic object mapper that used reflection to parse a JSON document to a flat Lucene Document and a separate piece that handled the Lucene Query Parser instantiation, my implementation supports ranges so there's a bit more involved in the setup, and only one of the two, since they both require Analyzers, was properly using a StandardAnalyzer without stopwords.
Dang it.
Thanks for your quick help though! I'm interested in contributing to Luwak as I've grown to be a great fan of it so you may see something from me in the near future!