solariumphp/solarium

Extract query and big literals parameters HTTP 414

AndreaDiPietro-ADP opened this issue · 15 comments

Extract query put literals on query string and this may cause an HTTP 414 in some real use case,
I made this Solarium plugin to solve this if you want you can include into Solarium base code.

It looks good :-)

If you adopt it for the master branch (upcoming 6.0.0) and add a test, we can merge it.

@mkalkbrenner Should this be integrated in the existing PostBigRequest plugin? As another branch for this if:

if (Request::METHOD_GET == $request->getMethod() &&
strlen($queryString) > $this->getMaxQueryStringLength()) {

Maybe move the actual heavy lifting to two separate protected functions. That allows one to extend the plugin and override either of those (but not necessarily both) if that happens to suit one's needs.


Detecting an extract query by the handler isn't 100% reliable. You can configure Solr with a different requestHandler for extracting.


If you need something other than UTF-8, you can use ->setInputEncoding() on the query object. PostBigRequest retrieves it like this with a default fallback:

$charset = $request->getParam('ie') ?? 'utf-8';

If you need something other than UTF-8, you can use ->setInputEncoding() on the query object.

That's new for the upcoming 6.0.0, by the way. It won't work if you need to backport to a 5.x release.

is there a specific reason the PostBigRequest is a seperate plugin and not a feature of the request itself?

@wickedOne I assume that a "plugin" forces people to think about its configuration. To avoid that plugin you could also modify the configuration of your container, jetty in most Solr installations.
And POST could be blocked by some setups.
In general switching to POST should be avoided wherever possible because you bypass all caches.

@mkalkbrenner thanx for explaining, wasn't aware solr caches were request method sensitive

From the Solr documentation:

Solr only emits cache header elements for GET and HEAD requests. The HTTP standard does not allow cache related headers for POST requests.

Solr's documentation isn't entirely accurate. (You might have noticed I have a penchant for exact docs.)

For HTTP/1.0, RFC 1945 states:

Applications must not cache responses to a POST request because the
application has no way of knowing that the server would return an
equivalent response on some future request.

For HTTP/1.1, RFC 7231 states:

Responses to POST requests are only cacheable when they include
explicit freshness information (see Section 4.2.1 of [RFC7234]).
However, POST caching is not widely implemented.

It would be more precise to say that the HTTP/1.0 standard doesn't allow it. And because

Solr does everything to avoid such problems because it emits HTTP 1.0 and HTTP 1.1 compliant HTTP headers.

, it doesn't emit cache headers for POST requests.

@thomascorthals thanks for the clarification.
But I think that my statement remains valid for solarium:

In general switching to POST should be avoided wherever possible because you (might) bypass all caches.

@mkalkbrenner I agree with your statement. We shouldn't switch to POST automatically if the behaviour isn't identical to GET.

I just wanted to provide some context for the way Solr does things.

Just a thought: the reason why we don't switch to POST automatically without PostBigRequest is because it changes caching behaviour. However, an Extract query is already a POST. Couldn't we always put all parameters in the request body instead of the query string?

I don't think so. It is also common to run Solr behind a reverse caching proxy. Removing the GET parameters would lead to false cache hits.
I assume that Solr's HTTP Cache (which is not enabled by default) will just respect GET parameters to distinguish between different searches.

@AndreaDiPietro-ADP Could you open a PR?

Within the PR a test and some documentation (and an example) should be added.

The PostBigExtractQuery plugin was released as part of Solarium 6.1.5.