Handle GZIP stream correctly
josvanroosmalen opened this issue · 7 comments
Internal Bugtrack reference: 42127
The next major version of MarkLogic Server will support HTTP Chunking and GZIP compression if requested by the request headers.
Some MarkLogic Java Client API unit tests are now failing against this latest MLS build.
Based on a first analysis:
- The Java Client API is always sending requests that it accepts GZIP (
accept-encoding: gzip) - The new server is going to respond with a GZIP. The current MLS version is ignoring this, so it sends an uncompressed stream.
- The Java Client API is producing undesired outcome, because it makes certain assumptions which are not true in case a GZIP stream is delivered. For example
body.contentLength()is frequently tested to see the response is empty or not. In case of GZIP this is always-1, so this test result in the wrong program flow.
The goal of this change is to get the Java Client API working with both uncompressed and compressed streams, so that all integration tests runs fine with the current build and next major release of the server.
I am working on a fix now.
Will need to review reasons for current implementation, as described above.
I'm thinking that only requests that return a large payload should be compressed with the rationale that compressing and decompressing small responses might degrade instead of improving performance.
One way to address this problem would be for the Java Client to submit the accept-encoding: gzip header only for requests that could return a large payload. The Java Client would omit that header for requests that always return a small payload.
Another way to address this problem would be for the REST API endpoint to control whether it compresses the response. In this approach, the REST API would need a response configuration pragma or builtin so it could enable or disable support for gzip.
A third way to address the problem would be for the appserver to compress only if the response is larger than a minimum threshhold.
It's not always about efficiency of CPU/time. When dealing with egress in cloud environments, smaller compressed payloads can save money.
I made a MLS fix which does (until now) not require a Java API change to fix regression. The change is that we do not send content-encoding:gzip if the body is empty. In that case we do not sent this content-encoding but we sent content-length:0 instead. By doing this program flow within the client API works again, because if content-encodig:gzip is there, then it contains a body, and in OKHTTP the body.contentLength()=-1. If the body is empty then contentLength()==0. As long as the API checks contentLength!=0 for being body present, then this works because this both matches -1 (for GZIP) and positive values (unzipped). This goes wrong if the API is using contentLength > 0 but that seems not the case.
Also this solution seems to fit better the 'expectations' from clients. You can find multiple complains on internet for different server products which sends content-encoding:gzip followed by no-body. Other client/middleware/integration products seems to have problems with this. Also OKHttp has a problem with this, because the GZIPSource simply starts to try to read the GZIP header without checking it is there, causing IO exceptions.
I would propose to keep this ticket open until regression is confirmed fix. If still some Client API changes are needed we will use this ticket for this.
Ajit reported that the Java Client vs next major MLS fixed so we close this one now.