UnicodeDecodeError: 'charmap' codec can't decode byte
yuis-ice opened this issue · 0 comments
yuis-ice commented
When I have the following urls.yaml, it succeeds,
kind: url
name: rakuten card promo
url: https://www.rakuten-card.co.jp/campaign/add-card/
filter:
- css: "section[id='rule_detail']"
- html2text:
method: re
- grep: "2,000"
> urlwatch --urls urls.yaml --test-filter 1
2枚目の楽天カードを作成&利用特典…2,000ポイント(期間限定ポイント)
but with the following urls.yaml, where I have a utf-8 text content on it, it gets an error.
kind: url
name: rakuten card promo
url: https://www.rakuten-card.co.jp/campaign/add-card/
filter:
- css: "section[id='rule_detail']"
- html2text:
method: re
- grep: "2,000ポイント"
> urlwatch --urls urls.yaml --test-filter 1
Traceback (most recent call last):
File "C:\Python37\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\pg\urlwatch_dev\venv\Scripts\urlwatch.exe\__main__.py", line 7, in <module>
File "c:\pg\urlwatch_dev\venv\lib\site-packages\urlwatch\cli.py", line 108, in main
urlwatch = Urlwatch(command_config, config_storage, cache_storage, urls_storage)
File "c:\pg\urlwatch_dev\venv\lib\site-packages\urlwatch\main.py", line 66, in __init__
self.load_jobs()
File "c:\pg\urlwatch_dev\venv\lib\site-packages\urlwatch\main.py", line 85, in load_jobs
jobs = self.urls_storage.load_secure()
File "c:\pg\urlwatch_dev\venv\lib\site-packages\urlwatch\storage.py", line 316, in load_secure
jobs = self.load()
File "c:\pg\urlwatch_dev\venv\lib\site-packages\urlwatch\storage.py", line 419, in load
return self._parse(fp)
File "c:\pg\urlwatch_dev\venv\lib\site-packages\urlwatch\storage.py", line 385, in _parse
jobs = [JobBase.unserialize(job) for job in yaml.load_all(fp, Loader=yaml.SafeLoader)
File "c:\pg\urlwatch_dev\venv\lib\site-packages\urlwatch\storage.py", line 385, in <listcomp>
jobs = [JobBase.unserialize(job) for job in yaml.load_all(fp, Loader=yaml.SafeLoader)
File "c:\pg\urlwatch_dev\venv\lib\site-packages\yaml\__init__.py", line 90, in load_all
loader = Loader(stream)
File "c:\pg\urlwatch_dev\venv\lib\site-packages\yaml\loader.py", line 34, in __init__
Reader.__init__(self, stream)
File "c:\pg\urlwatch_dev\venv\lib\site-packages\yaml\reader.py", line 85, in __init__
self.determine_encoding()
File "c:\pg\urlwatch_dev\venv\lib\site-packages\yaml\reader.py", line 124, in determine_encoding
self.update_raw()
File "c:\pg\urlwatch_dev\venv\lib\site-packages\yaml\reader.py", line 178, in update_raw
data = self.stream.read(size)
File "C:\Python37\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 214: character maps to <undefined>
Versions:
> python --version
Python 3.7.6
> urlwatch --version
urlwatch 2.25
OS: Windows 10, Powershell