Force encoding to skip chardet
nyanbinaryneko opened this issue · 1 comments
nyanbinaryneko commented
I'm doing a rather large archival project. A bottleneck I have encountered is chardet trying to guess encoding, when all I need is the returned URL. One way around this, is to optionally set the encoding on a request via Requests. Here's an example of the returned spam:
2019-04-09 22:33:27 [chardet.charsetprober] DEBUG: utf-8 confidence = 0.7525
2019-04-09 22:33:27 [chardet.charsetprober] DEBUG: SHIFT_JIS Japanese confidence = 0.01
2019-04-09 22:33:27 [chardet.charsetprober] DEBUG: EUC-JP Japanese confidence = 0.01
2019-04-09 22:33:27 [chardet.charsetprober] DEBUG: GB2312 Chinese confidence = 0.01
2019-04-09 22:33:27 [chardet.charsetprober] DEBUG: EUC-KR Korean confidence = 0.01
2019-04-09 22:33:27 [chardet.charsetprober] DEBUG: CP949 Korean confidence = 0.01
2019-04-09 22:33:27 [chardet.charsetprober] DEBUG: Big5 Chinese confidence = 0.01
2019-04-09 22:33:27 [chardet.charsetprober] DEBUG: EUC-TW not active
If all someone is doing is returning the URL, we can skip this step in Requests.
palewire commented
I don't know how to do this so I'm going to close the ticket. If you have a pull request you can make I'd consider it. Thanks.