Text field facet returns tokenized/analyzed terms

Question

Text field facet returns tokenized/analyzed terms

Closed this issue 6 years ago · 1 comments

Hi,

I am using Elasticsearch 5.x with this backed and so far things seem to be working fine except when faceting text fields the returned facet strings seem to be tokenized, for example facet values are converted to lowercase and multi word field values a split into multiple facets.

I think this is due to something similar to whats described at https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html

Assuming I have following records with a field called state

{ "state" : "New York" }
{ "state" : "New Jersey" }
{ "state" : "New Mexico" }
{ "state" : "New York" }
{ "state" : "New York" }

Whenever I run searchqueryset.facet('state') I get something like;-

[
    ("new", 5),
    ("york", 3),
    ("jersey", 1),
    ("mexico", 1)
]

Instead of

[
    ("New York", 3),
    ("New Jersey", 1),
    ("New Mexico", 1)
]

I guess one of the potential soultion could be utilizing Elasticsearch multi-fields in indexing faceted fields and in terms aggregation.

I tried to fork and use Elasticsearch multi-fields with this backend and things seems to be working as expected so far, check https://github.com/machakux/haystack-elasticsearch5 (backend.py). In this case you don't have to remove faceted=True in your in your haystack search index fields

What do you think? Am I missing something?

Answer 1 · 2018-01-31T16:51:11.000Z

Any updates?