materialsvirtuallab/monty

[Bug]: zopen fails on some language platforms due to lack of explicit encoding (`UnicodeDecodeError`)

Closed this issue · 3 comments

Email (Optional)

No response

Version

v2024.2.26

Which OS(es) are you using?

  • MacOS
  • Windows
  • Linux

What happened?

As reported by @xiaoxiaozhu123 in another repository, connecting to a JSONStore in maggma using zopen fails on non-English platforms (or at least, on a Chinese language platform) because the character encoding is not explicitly set due to UnicodeDecodeError

See KingsburyLab/pyEQL#122 (comment)

This occurred in:

python 3.11.8
monty.__version__
>> '2024.2.26'
maggma.__version__
>> '2024.2.26'

Code snippet

File ~\.conda\envs\pyiclab\Lib\site-packages\maggma\stores\mongolike.py:716, in JSONStore.read_json_file(self, path)
    708 """
    709 Helper method to read the contents of a JSON file and generate
    710 a list of docs.
   (...)
    713     path: Path to the JSON file to be read
    714 """
    715 with zopen(path) as f:
--> 716     data = f.read()
    717     data = data.decode() if isinstance(data, bytes) else data
    718     objects = bson.json_util.loads(data) if "$oid" in data else orjson.loads(data)

UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 893: illegal multibyte sequence


### Log output

_No response_

### Code of Conduct

- [X] I agree to follow this project's Code of Conduct

A possible solution is to explicitly set the encoding:

with zopen(path, encoding='utf-8') as f:

This is not a bug. All args and kwargs are pass through to the actual open method. You can set the encoding in the zopen function. So this is a maggma problem, not a monty problem.

That makes sense; thanks for the direction @shyuep . I will open a maggma issue.