iw4p/proxy-scraper

UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 7617: character maps to <undefined>

are-you-serat opened this issue · 4 comments

Hi. I have problem with proxy checker. When I run proxy checker I get error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 7617: character maps to . In proxy file which I checking I have 37.065 proxies.

PS C:\Users\Windows 11\PycharmProjects\proxy-scraper> python proxyChecker.py -t 20 -s google.com -l output.txt
Traceback (most recent call last):
File "C:\Users\Windows 11\PycharmProjects\proxy-scraper\proxyChecker.py", line 120, in
check(file=args.list, timeout=args.timeout, method=args.proxy, site=args.site, verbose=args.verbose,
File "C:\Users\Windows 11\PycharmProjects\proxy-scraper\proxyChecker.py", line 52, in check
for line in f:
File "C:\Users\Windows 11\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1251.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 7617: character maps to

iw4p commented

Hi. You can solve it by setting encoding="utf-8" in where the open is getting called.
For instance:
with open(file, "w", encoding='utf8') as f:

Hi. You can solve it by setting encoding="utf-8" in where the open is getting called. For instance: with open(file, "w", encoding='utf8') as f:

Unfortunately it doesn't work

PS D:\Scrapers and Checkers\proxy-scraper> python proxyChecker.py -t 20 -s google.com -l output.txt
Traceback (most recent call last):
File "D:\Scrapers and Checkers\proxy-scraper\proxyChecker.py", line 120, in
check(file=args.list, timeout=args.timeout, method=args.proxy, site=args.site, verbose=args.verbose,
File "D:\Scrapers and Checkers\proxy-scraper\proxyChecker.py", line 52, in check
for line in f:
File "C:\Users\Windows 11\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1251.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 7498: character maps to
PS D:\Scrapers and Checkers\proxy-scraper>

iw4p commented

Did you add it on line 52? It seems you are still using it without encoding='utf8'

Thanks! It works fine.