Extract query and big literals parameters HTTP 414
AndreaDiPietro-ADP opened this issue · 15 comments
Extract query put literals on query string and this may cause an HTTP 414 in some real use case,
I made this Solarium plugin to solve this if you want you can include into Solarium base code.
It looks good :-)
If you adopt it for the master branch (upcoming 6.0.0) and add a test, we can merge it.
@mkalkbrenner Should this be integrated in the existing PostBigRequest
plugin? As another branch for this if
:
solarium/src/Plugin/PostBigRequest.php
Lines 66 to 67 in 559f8a8
Maybe move the actual heavy lifting to two separate protected functions. That allows one to extend the plugin and override either of those (but not necessarily both) if that happens to suit one's needs.
Detecting an extract query by the handler isn't 100% reliable. You can configure Solr with a different requestHandler for extracting.
If you need something other than UTF-8, you can use ->setInputEncoding()
on the query object. PostBigRequest
retrieves it like this with a default fallback:
solarium/src/Plugin/PostBigRequest.php
Line 68 in 559f8a8
If you need something other than UTF-8, you can use
->setInputEncoding()
on the query object.
That's new for the upcoming 6.0.0, by the way. It won't work if you need to backport to a 5.x release.
is there a specific reason the PostBigRequest
is a seperate plugin and not a feature of the request itself?
@wickedOne I assume that a "plugin" forces people to think about its configuration. To avoid that plugin you could also modify the configuration of your container, jetty in most Solr installations.
And POST could be blocked by some setups.
In general switching to POST should be avoided wherever possible because you bypass all caches.
@mkalkbrenner thanx for explaining, wasn't aware solr caches were request method sensitive
From the Solr documentation:
Solr only emits cache header elements for GET and HEAD requests. The HTTP standard does not allow cache related headers for POST requests.
Solr's documentation isn't entirely accurate. (You might have noticed I have a penchant for exact docs.)
For HTTP/1.0, RFC 1945 states:
Applications must not cache responses to a POST request because the
application has no way of knowing that the server would return an
equivalent response on some future request.
For HTTP/1.1, RFC 7231 states:
Responses to POST requests are only cacheable when they include
explicit freshness information (see Section 4.2.1 of [RFC7234]).
However, POST caching is not widely implemented.
It would be more precise to say that the HTTP/1.0 standard doesn't allow it. And because
Solr does everything to avoid such problems because it emits HTTP 1.0 and HTTP 1.1 compliant HTTP headers.
, it doesn't emit cache headers for POST requests.
@thomascorthals thanks for the clarification.
But I think that my statement remains valid for solarium:
In general switching to POST should be avoided wherever possible because you (might) bypass all caches.
@mkalkbrenner I agree with your statement. We shouldn't switch to POST automatically if the behaviour isn't identical to GET.
I just wanted to provide some context for the way Solr does things.
Just a thought: the reason why we don't switch to POST automatically without PostBigRequest
is because it changes caching behaviour. However, an Extract query is already a POST. Couldn't we always put all parameters in the request body instead of the query string?
I don't think so. It is also common to run Solr behind a reverse caching proxy. Removing the GET parameters would lead to false cache hits.
I assume that Solr's HTTP Cache (which is not enabled by default) will just respect GET parameters to distinguish between different searches.
@AndreaDiPietro-ADP Could you open a PR?
Within the PR a test and some documentation (and an example) should be added.
@mkalkbrenner
PR opened.
The PostBigExtractQuery
plugin was released as part of Solarium 6.1.5.