Unhandled incorrect header/body check
synzen opened this issue · 9 comments
Here I used node-fetch to with Accept-Encoding: gzip,deflate
(the default) for a feed that sends a erroneously sends back a response with content-encoding: gzip
when the body is not actually compressed. Unfortunately it seems that feedparser doesn't detect such a problem here. node-fetch doesn't throw any errors for this unless you call res.text()
, res.json()
(see node-fetch/node-fetch#45), and instead I immediately pipe it to feedparser rather than letting node-fetch handle it
In my example below feedparser doesn't emit error and doesn't emit end.
feedparser v2.2.9
Node v12.7.0
Feed: https://eliteesp.es/category/galnet/feed/
const FeedParser = require('feedparser')
const fetch = require('node-fetch')
const feedparser = new FeedParser()
fetch('https://eliteesp.es/category/galnet/feed/').then(res => {
if (res.status !== 200) throw new Error('Non-200 status code')
res.body.pipe(feedparser)
})
feedparser.on('error', function (error) {
console.log('fp err', error)
// always handle errors
});
feedparser.on('readable', function () {
// This is where the action is!
var stream = this; // `this` is `feedparser`, which is a stream
var meta = this.meta; // **NOTE** the "meta" is always available in the context of the feedparser instance
var item;
while (item = stream.read()) {
console.log(item);
}
})
feedparser.on('end', function () {
console.log('end')
})
There is no output when running this. Not sure if feedparser is supposed to handle this case though
Not sure what your asking, but if fetch returns encoded (gzip) you need to decode it before piping to feedparser. When I do this using a mikeal request, this feed parses just fine. Note, this feed is not encoded.
node-fetch
doesn't auto-decode the body unless prompted by the user (which is why it's lightweight and faster from my own tests - like the browser's fetch API) - instead, I let feedparser take on the body. node-fetch
works over 99.999% of the time (it works reliably, and is lighter than other libs!) until I encountered this issue, where feedparser failed to detect an invalid feed for a feed like this (had me scratching me head for days as to what was happening).
You can see the link above has invalid content-encoding headers when you open it in chrome and you see ERR_CONTENT_DECODING_FAILED
and the explanation for it: https://stackoverflow.com/questions/14039804/error-330-neterr-content-decoding-failed. TLDR; response says its gzipped, and it's actually not - browser is confused. I assume feedparser is confused here as well, but instead of giving an error, feedparser just hangs indefinitely
Not a node-fetch user, but parses fine as a mikeal request (feed header in this case indicates that is not encoded).
I get that request
works, as you've said before, but I'm saying the problem is that feedparser isn't emitting any errors here, as I mentioned in my last sentence
Yea, just catching up to subtlety of your problem.
I just set myself up to use node-fetch. When I run it with your feed and {compress:false} option, it works.
Yes it will work in that case because then their server will send an uncompressed body, with no gzip in content-encoding
(the main problem with their feed)
but the point here is that feedparser is not emitting an error here when I really think it should (if it's possible) - the issue is basically being swallowed up by the parser. I'm not trying to find a way to work with the feed, just trying to suggest that feedparser should emit an error so
- Someone else doesn't spend hours scratching their heads why a feed may never resolve through feedparser
- To handle it appropriately in error handling
I think Feedparser is just waiting for input. I think node-fetch maybe hung, not handling the decoding correctly in your offending feeds case. If that's the case Feedparser may not be the place for the fix.
@synzen This doesn't appear to be related to feedparser.
Make the following change to your example script:
- res.body.pipe(feedparser)
+ res.body.pipe(process.stdout)
You'll still see the pathological behavior -- no output, no errors. 🤷♂