creationix/http-parser-js

returns empty body in some websites

Closed this issue ยท 27 comments

in this websites.

testcase

var HTTPParser = require('http-parser-js').HTTPParser;
process.binding('http_parser').HTTPParser = HTTPParser;

var http = require("http");

//http://www.rockbox.org/
//http://art-director.co.il/

http.get("http://www.rockbox.org/" , function(res) {

    console.log(res.headers);
    var bb = [];

    res.on('data',function(b){
        bb.push(b);
    });
    res.on('end',function(){
        console.log("end");
        console.log(Buffer.concat(bb) + '');
    });
});

It seems that because this header

upgrade: 'h2,h2c'

I try fix it in this commit
magicode@fe4446e
But I do not know if it's true solution

I did some poking around with this and it looks like the actual problem is possibly here:

if (info.upgrade) {
this.nextRequest();

Compared to the native node HTTP parser, starting with some version of node, it looks like it will only stop parsing if the onHeadersComplete callback says to do so. I think, though, with older versions that perhaps it always cancels if it's an upgrade request, but I'm not sure. However, I think it is probably safe to not worry about older versions of Node than the LTS versions at this point (maybe back to 0.12? I know the one thing I was using this for on an old fork of Node is no longer relevant, and I think all users who have ever said anything about this project are on newer versions).

Re: magicode's fix - there's nothing in Node looking for those particular headers (which are part of HTTP 2.0), so we shouldn't be doing anything with them either in the parser. Also, for the test, we should not have tests rely on 3rd party websites, if they change their website, our tests would stop working - some of the other tests serve custom headers, and we should make a test that serves the same headers which should reproduce it without external dependencies (or, maybe, the latest tests include one already and we just need to merge them in).

Same issue with: https://www.lizengo.it
Is there a planned fix for this issue?

Someone also emailed me directly with a similar test case (but needed to keep the domain private for privacy reasons). @Jimbly is there any way I can help with this?

I don't actually remember any details other than what I wrote in the message above, and don't have time to dig into it currently, so if you want to take a stab at a fix, that would be great =).

umutm commented

I believe that the module was compatible until nodejs/node@411d813 (issue: icing/mod_h2#73) as Nodejs/ http parser was also failing to parse the responses in same cases.

Thanks @umutm let me know if this proposed patch works as well. It should be safer and match the node behavior closer.

Emtpy response can be a error from Apache 2.4.25 (downgrade to 2.4.23 will solve the error)

For information http://forum.directadmin.com/showthread.php?t=54237
http2 is enabled default on many servers and not working OK 2.4.25.

Nice to have (request?):

If upgrade header, first check on http1 request, if content OK, then check http2 response.
If response2 = error, and http1 = ok, show different warning

http2 use different layers in apache, we have some sites that are working OK on http1, but showing error on http2 (chrome for example) and the spider is not giving any warning about it.

@blageweg, The goal for this library is to be a drop in replacement for node's native HTTP parser. We don't want to design behavior or logic, just clone what node does.

My proposed patch, hopefully, brings us back in line with the change that node made a while back.

If the people in this issue can confirm my patch solves their problems (makes this work like native), then I'll merge it and release a new version.

umutm commented

The proposed patch (magicode@fe4446e) seems to work all fine.

We send requests to a pretty big list of websites (1 million+), using the patch since the last 24 hours and no known issues yet.

I'll be updating back within few days again if (or if not) we discover any issues.

@creationix unfortunately #31 is not solving the issue, the regexp on this.connection match the upgrade in 'upgrade, close'.
I've written a small test if you want to confirm the issue faster:

process.binding('http_parser').HTTPParser = require('http-parser-js').HTTPParser;
var request = require('request');

request('https://www.lizengo.it', function(error, response, body){
	if(!error){
		console.log('body length lizengo: '+body.length)
	}else{
		console.log('error!')
		console.log(JSON.stringify(error))
	}
});

request('https://www.rockbox.org', function(error, response, body){
	if(!error){
		console.log('body length rockbox: '+body.length)
	}else{
		console.log('error!')
		console.log(JSON.stringify(error))
	}
});

request('https://www.kestile.com', function(error, response, body){
	if(!error){
		console.log('body length kestile: '+body.length)
	}else{
		console.log('error!')
		console.log(JSON.stringify(error))
	}
});

by commenting the first line (process.binding('http_parser').HTTPParser = require('http-parser-js').HTTPParser;) you can get the full sizes of the pages

Probably it was fixed by node in the patch nodejs/node@1050594

Oh wow, that'a big patch in node. I can't port this in the next few days (busy prepping for baby), but thanks for the excellent test case.

Looking forward as well for a fix and a new release

Any update on this issue? I encountered the same issue when connecting to the Magento API.
Magento responds with an "Upgrade: 2hc" header, which in turn leads to an empty response body.

For future reference; i worked around this issue by supplying the "Upgrade:2hc" header in the request, so the server doesn't have to respond with that header. This works but is not a future proof solution of course.

umutm commented

We are still using the proposed patch by magicode@fe4446e and it works smooth.

Thanks :)

@creationix magicode@fe4446e also contains a unit test that covers the change. Can we merge that fork with this codebase?
If you don't have time (congrats on being a dad now :)), i can do it. Let me know.

Send a PR and I'll see if I or @Jimbly can merge it.

I was leery of a change that's adding specific code for a particular header which Node's parser does not have any special cases for, however at this point, without anyone digging deep into why Node's parser doesn't have this issue, as long as it doesn't break any of the other unit tests, seems like any fix is good =). I think all unit tests were passing on Node v8.1 as of my last merge.

Oh I see, this was the patch that's different from node's. Yeah, it would be better long-term to keep compat. with node's parser if possible.

BTW, if anyone wants a http stack different from node's that I am actively working on, see https://github.com/creationix/weblit-js or https://github.com/creationix/weblit depending on which language you prefer.

One large benefit of this over node's http stack is routable websockets. It doesn't treat requests that happen to contain upgrade different from other requests, but does allow for upgrades (like websockets).

Any update on this? We encounter this issue now for https://www.taylor-times.com.
When using the default parser, all is well.
With this parser, the body is empty,

I've tracked down the actual cause of this - when it's possibly an upgrade request, but new versions of Node return skipBody=0 from onHeadersComplete(), we need to not skip the body. It was a bit finicky to track down and get a fix working without breaking actual upgrade requests on old versions of Node, but I think I've got something ready in #50. Assuming Travis clears it, I'll merge and publish it soon. Seems to work in Node v6.x (what Travis checks), v8.x, v9.x (latest), and v0.11-ish (what the included "iojs durability tests" pretend to be, I guess).

It should definitely fix the empty body issue, but if anyone is using this for actual upgrade requests, please give that branch a try and let us know how it goes.

This has been published as v0.4.11, hopefully fixing this issue =).

Woohoo! Thanks for the swift response, i'll test it soon as possible!

Yes, works like a charm. So, i guess we can close this one?

Thanks for the swift response

I wouldn't call "almost 2 years" "swift", but you're welcome. After dealing with annoying PS4 TRCs all day yesterday, diving deep into Node internals sounded "fun" by comparison; I'm glad this is finally resolved =)