[Bug]: zopen fails on some language platforms due to lack of explicit encoding (`UnicodeDecodeError`)
Closed this issue · 3 comments
rkingsbury commented
Email (Optional)
No response
Version
v2024.2.26
Which OS(es) are you using?
- MacOS
- Windows
- Linux
What happened?
As reported by @xiaoxiaozhu123 in another repository, connecting to a JSONStore
in maggma
using zopen
fails on non-English platforms (or at least, on a Chinese language platform) because the character encoding is not explicitly set due to UnicodeDecodeError
See KingsburyLab/pyEQL#122 (comment)
This occurred in:
python 3.11.8
monty.__version__
>> '2024.2.26'
maggma.__version__
>> '2024.2.26'
Code snippet
File ~\.conda\envs\pyiclab\Lib\site-packages\maggma\stores\mongolike.py:716, in JSONStore.read_json_file(self, path)
708 """
709 Helper method to read the contents of a JSON file and generate
710 a list of docs.
(...)
713 path: Path to the JSON file to be read
714 """
715 with zopen(path) as f:
--> 716 data = f.read()
717 data = data.decode() if isinstance(data, bytes) else data
718 objects = bson.json_util.loads(data) if "$oid" in data else orjson.loads(data)
UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 893: illegal multibyte sequence�
### Log output
_No response_
### Code of Conduct
- [X] I agree to follow this project's Code of Conduct
rkingsbury commented
A possible solution is to explicitly set the encoding:
with zopen(path, encoding='utf-8') as f:
shyuep commented
This is not a bug. All args and kwargs are pass through to the actual open method. You can set the encoding in the zopen function. So this is a maggma problem, not a monty problem.
rkingsbury commented
That makes sense; thanks for the direction @shyuep . I will open a maggma
issue.