gitronald/WebSearcher

unzip html by default

Closed this issue · 1 comments

This follows up on #21, and issues raised off-platform. The parser is failing because the search engine started compressing their HTML with brotli again. A previous commit (5bc0883) turned default unzipping off because the HTML was not being compressed at that time. This was occurring despite including brotli in the default requests header for 'Accept-Encoding':

default_encoding = 'gzip,deflate,br'

It seems they're back to brotli zipping, so as the safe option, we should always attempt to unzip by default.

Fixed with 4571201