elastic/elasticsearch-rs

[BUG] Scroll API fails when query hits too many shards

xrl opened this issue · 1 comments

xrl commented

Describe the bug

The size of the scroll_id token is proportional to the number of shards hit by the query.

We found info on scroll_id's generating large strings elastic/elasticsearch-py#971 . And looking over the Scroll code in rust, it always puts the scroll_id in to a GET request and in to the query params. So large queries will cause the Elasticsearch cluster's to report An HTTP line is larger than 4096 bytes..

To Reproduce
Steps to reproduce the behavior:

  1. Hit a lot of shards in a query
  2. Watch ES reject the second page of your scroll logic

Expected behavior

The Scroll code in rust should always use a POST so it can have long scroll_ids.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [e.g. Windows 10 Pro]
  • rustc version [e.g. rustc --version]

Additional context
Add any other context about the problem here.

Hi @xrl

And looking over the Scroll code in rust, it always puts the scroll_id in to a GET request and in to the query params. So large queries will cause the Elasticsearch cluster's to report An HTTP line is larger than 4096 bytes.

Scroll will use a GET request if there is no request body, a POST otherwise

#[doc = "Creates an asynchronous call to the Scroll API that can be awaited"]
pub async fn send(self) -> Result<Response, Error> {
let path = self.parts.url();
let method = match self.body {
Some(_) => Method::Post,
None => Method::Get,
};

This is generated from the REST API spec scroll.json which has deprecated scroll_id as part of the query string parameters in 7.0.0, with a view to remove in the future major version. The API itself will continue to support both GET and POST methods, as different HTTP libraries across the language eco systems (rightly or wrongly 🙂 ) support sending request bodies with a GET request.

In light of the change happening in the spec, I'm going to leave it as is in the client, as the change will make its way into the client. An example of sending the scroll id in the body is in https://github.com/elastic/elasticsearch-rs/blob/master/elasticsearch/docs/Elasticsearch.fn.scroll.md, which will make its way into the docs on docs.rs on the next release.