barseghyanartur/django-elasticsearch-dsl-drf

Support for track_total_hits

selimt opened this issue · 4 comments

We want to be able to return the exact number of matches from an elasticsearch query. Currently if the number of hits exceed 10000, then hits.total.value contains 10000. We understand that this is an elasticsearch limitation.

This also makes it hard for

There is a search option called "track_total_hits" where if this is set to "true", then it hits.total.value does contain the accurate number of hits:

https://www.elastic.co/guide/en/elasticsearch/reference/7.13/search-your-data.html#track-total-hits

Is there a way to incorporate this option into elasticsearch-dsl-drf ? Although this makes it harder for pagination to be implemented correctly since paging can still not exceed 10000.

Alternatively we can use the Count API in ES but that requires re-running the same query twice. We would then add this additional value in the result.

Thanks.

@selimt:

At the moment is could be solved on the ViewSet definition level as follows:

from django_elasticsearch_dsl_drf.viewsets import DocumentViewSet

class MySearchViewSet(DocumentViewSet):
    def __init__(self, *args, **kwargs):
        super(MySearchViewSet, self).__init__(*args, **kwargs)
        self.search.extra(track_total_hits=True)

@selimt:

Did it work for you?

That didn't but this did :

        self.search = self.search.extra(track_total_hits=True)

Although the pagination doesn't seem to work with it since if I try to provide an offset past 10000 it fails:

{
    "errors": {
        "traceback": [
            "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/rest_framework/views.py\", line 506, in dispatch\n    response = handler(request, *args, **kwargs)\n  File \"/opt/catalog_server/python/server/catalog_search/views.py\", line 793, in list\n    page = self.paginate_queryset(queryset)\n  File \"/usr/local/lib/python3.7/site-packages/rest_framework/generics.py\", line 171, in paginate_queryset\n    return self.paginator.paginate_queryset(queryset, self.request, view=self)\n  File \"/usr/local/lib/python3.7/site-packages/django_elasticsearch_dsl_drf/pagination.py\", line 379, in paginate_queryset\n    resp = queryset[self.offset:self.offset + self.limit].execute()\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch_dsl/search.py\", line 715, in execute\n    self, es.search(index=self._index, body=self.to_dict(), **self._params)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py\", line 168, in _wrapped\n    return func(*args, params=params, headers=headers, **kwargs)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/client/__init__.py\", line 1673, in search\n    body=body,\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py\", line 458, in perform_request\n    raise e\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py\", line 426, in perform_request\n    timeout=timeout,\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py\", line 277, in perform_request\n    self._raise_error(response.status, raw_data)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/connection/base.py\", line 331, in _raise_error\n    status_code, error_message, additional_info\nelasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'Result window is too large, from + size must be less than or equal to: [10000] but was [10011]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.')\n"
        ]
    },
    "data": {
        "offset": "10001"
    }
}

That didn't but this did :

        self.search = self.search.extra(track_total_hits=True)

Although the pagination doesn't seem to work with it since if I try to provide an offset past 10000 it fails:

{
    "errors": {
        "traceback": [
            "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/rest_framework/views.py\", line 506, in dispatch\n    response = handler(request, *args, **kwargs)\n  File \"/opt/catalog_server/python/server/catalog_search/views.py\", line 793, in list\n    page = self.paginate_queryset(queryset)\n  File \"/usr/local/lib/python3.7/site-packages/rest_framework/generics.py\", line 171, in paginate_queryset\n    return self.paginator.paginate_queryset(queryset, self.request, view=self)\n  File \"/usr/local/lib/python3.7/site-packages/django_elasticsearch_dsl_drf/pagination.py\", line 379, in paginate_queryset\n    resp = queryset[self.offset:self.offset + self.limit].execute()\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch_dsl/search.py\", line 715, in execute\n    self, es.search(index=self._index, body=self.to_dict(), **self._params)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py\", line 168, in _wrapped\n    return func(*args, params=params, headers=headers, **kwargs)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/client/__init__.py\", line 1673, in search\n    body=body,\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py\", line 458, in perform_request\n    raise e\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py\", line 426, in perform_request\n    timeout=timeout,\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py\", line 277, in perform_request\n    self._raise_error(response.status, raw_data)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/connection/base.py\", line 331, in _raise_error\n    status_code, error_message, additional_info\nelasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'Result window is too large, from + size must be less than or equal to: [10000] but was [10011]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.')\n"
        ]
    },
    "data": {
        "offset": "10001"
    }
}

Ah, yeah, sure, the self.search = self.search.extra(track_total_hits=True) it should be.

Regarding the pagination after the 10,000. It's so by design in Elasticsearch. I think normal pagination would fail on that one too. When you want to search beyond 10,000 alternative pagination shall be used (search_after).

There's an issue for it.