creationix/http-parser-js

Parsing gzipped responses results in incorrect "body"

evanshortiss opened this issue · 10 comments

It looks like I might have come across a parsing issue in relation to gzipped responses.

Using the following express application:

'use strict'

const app = require('express')()
const compression = require('compression')
const cache = require('express-expeditious')({
  engine: require('expeditious-engine-redis')(),
  namespace: 'testing',
  defaultTtl: '1 minutes'
})

const compress = compression({
  // compress all responses, even smaller ones, for testing reasons
  threshold: 2
})

app.use(cache)

app.get('/uncompressed', (req, res) => {
  res.send(JSON.stringify({
    message: 'I should NOT have been compressed!'
  }))
})

app.get('/compressed', compress, (req, res, next) => {
  res.send(JSON.stringify({
    message: 'I should have been compressed!'
  }))
})

app.listen(3030, (e) => {
  if (e) {
    throw e
  }

  console.log('listening on port 3030')
})

The cache middleware receives the following buffer to write to a redis instance. Note the HTTP body is garbled due to the use of the compression middleware:

"HTTP/1.1 200 OK
X-Powered-By: Express
x-expeditious-cache: miss
Content-Type: text/html; charset=utf-8
ETag: W/"2c-6Ek1FkpDKtrt/CwQ4Wnp6gDfZJk"
Vary: Accept-Encoding
Content-Encoding: gzip
Date: Mon, 07 May 2018 20:30:58 GMT
Connection: keep-alive
Transfer-Encoding: chunked

a
����
35
�V�M-.NLOU�R�T(��/�IQ�H,KUHJM�SH��-(�*HMQT�����k,
0

"

When the above buffer parsed by this module, the following object is returned. Note that the body is malformed:

{
  "headers": {
    "X-Powered-By": "Express",
    "x-expeditious-cache": "miss",
    "Content-Type": "text/html; charset=utf-8",
    "ETag": "W/\"2c-6Ek1FkpDKtrt/CwQ4Wnp6gDfZJk\"",
    "Vary": "Accept-Encoding",
    "Content-Encoding": "gzip",
    "Date": "Mon, 07 May 2018 20:30:58 GMT",
    "Connection": "keep-alive",
    "Transfer-Encoding": "chunked"
  },
  "upgrade": false,
  "versionMajor": 1,
  "versionMinor": 1,
  "statusCode": 200,
  "statusMessage": "OK",
  "shouldKeepAlive": true,
  "body": "\u001f�\b\u0000\u0000\u0000\u0000\u0000"
}

I haven't gotten a chance to dig deeper yet, but it appears that the compressed body is causing an issue in the parser since myself and others have only stumbled across it in gzipped responses as detailed in this issue evanshortiss/express-expeditious#17

Hi, are you actually using http-parser-js, your example does not seem to be using it, and I don't find any reference to this module in your linked discussions. Is your problem actually with Node or the Node HTTP parser?

If this is actually something that works fine with the Node parser, but not with this parser, we can investigate it further, however I don't think there's any support for gzipped bodies at all currently in http-parser-js (not sure, that might be handled on the Node side anyway?).

@Jimbly my mistake for not including where the module is used. I should have specified that http-parser-js is used by the express-expeditious module here.

When a body is gzipped the http-parser-js returns that incorrect "body" as seen above and referenced in evanshortiss/express-expeditious#17

It looks like binary bodies are also not parsing correctly - evanshortiss/express-expeditious#14

I haven't had a chance to look at the code in http-parser-js but I assume the issue might be that binary data formats are not supported?

@Jimbly actually, ignore this on for a moment. I think there might be an issue with my codebase.

Okay, let us know if you find an issue, and if so, whether or not it behaves the same with the native Node.js parser as well. PRs always welcome =).

Alright, so here's what I've found. When parsing chunked bodies the HTTPParser.kOnBody callback I've registered is fired for each chunk except the final one. This is most easily shown by example I think so say the following imaginary chunked http response is returned like so:

app.get('/chunked', (req, res) => {
  res.write('some')
  res.write('chunked')
  res.write('data!')
  res.end()
})

Resulting in:

"HTTP/1.1 200 OK
X-Powered-By: Express
x-expeditious-cache: miss
Date: Fri, 11 May 2018 01:02:09 GMT
Connection: keep-alive
Transfer-Encoding: chunked

4
some
7
chunked
5
data!
0

"

When I parse this using the following code the final empty bytes are ignored. In other words the function is invoked for the 4, 7, and 5 bytes, but not the 0 so reconstructing the body fails. Perhaps this is as intended and just an edge case that needs to be handled by my own code?

let completeBody = ''
parser[HTTPParser.kOnBody] = function (body, contentOffset, len) {
  completeBody += `\r\n${len.toString(16)}\r\n`
  completeBody += body.slice(contentOffset, contentOffset + len).toString('ascii')
}

My use case for this is a little odd. I have no experience using the native parser but I can look into it.

That does sounds like a bug, though I honestly am not familiar with how the chunked stuff works in the parser either, but I'll try to take a look at it. To try the native parser, just change your line in express-expedious to : const HTTPParser = process.binding('http_parser').HTTPParser;. That being said, the native parser is not in the public API and has breaking changes every few major Node releases, so it's not a good fit for your use case, but it would be good to know if it behaves the same (in which case it's probably some bug in how you're using it) or not (in which case it's probably a bug in http-parser-js).

@evanshortiss I'm unable to reproduce any problem with chunked data. I tried running your first example (though had to leave out the expeditious-engine-redis line, as I don't have redis), and tried changing it's request handler to do the chunked writes as you mentioned above, and accessing the served page with either curl or chrome always returned the expected result. Can you provide a stand-alone example that reproduces the problem - either a script that makes a server, issues a request, and asserts the request is as expected, or just code calling into the http-parser-js module that reproduces this?

@Jimbly I gave this a try with the native parser and saw the same result. You're likely correct that it's a bug in my usage, or perhaps my understanding. Perhaps it's assumed the final empty byte is not necessary. In any case, I don't think this is a bug for this module to worry about! 😄

I should also point out that I realised that I don't need to use a full http parser for my use case any longer.

Thanks for being so responsive!