Text field facet returns tokenized/analyzed terms
Closed this issue · 1 comments
Hi,
I am using Elasticsearch 5.x with this backed and so far things seem to be working fine except when faceting text fields the returned facet strings seem to be tokenized, for example facet values are converted to lowercase and multi word field values a split into multiple facets.
I think this is due to something similar to whats described at https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html
Assuming I have following records with a field called state
{ "state" : "New York" }
{ "state" : "New Jersey" }
{ "state" : "New Mexico" }
{ "state" : "New York" }
{ "state" : "New York" }
Whenever I run searchqueryset.facet('state')
I get something like;-
[
("new", 5),
("york", 3),
("jersey", 1),
("mexico", 1)
]
Instead of
[
("New York", 3),
("New Jersey", 1),
("New Mexico", 1)
]
I guess one of the potential soultion could be utilizing Elasticsearch multi-fields in indexing faceted fields and in terms aggregation
.
I tried to fork and use Elasticsearch multi-fields with this backend and things seems to be working as expected so far, check https://github.com/machakux/haystack-elasticsearch5 (backend.py
). In this case you don't have to remove faceted=True
in your in your haystack search index fields
What do you think? Am I missing something?
Any updates?