unzip html by default
Closed this issue · 1 comments
gitronald commented
This follows up on #21, and issues raised off-platform. The parser is failing because the search engine started compressing their HTML with brotli again. A previous commit (5bc0883) turned default unzipping off because the HTML was not being compressed at that time. This was occurring despite including brotli in the default requests
header for 'Accept-Encoding'
:
WebSearcher/WebSearcher/searchers.py
Line 20 in bb32cba
It seems they're back to brotli zipping, so as the safe option, we should always attempt to unzip by default.