UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 79: invalid continuation byte

Question

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 79: invalid continuation byte

Closed this issue a year ago · 2 comments

This error is mostly caused by my own mistake.Nevertheless I'm sharing it here in case others may encouter the same thing.
I'm using this script to convert my Logseq note to Obsidian, while getting this error below:

Traceback (most recent call last):
  File "/Users/mettli/Documents/code/LogSeqToObsidian/convert_notes.py", line 397, in <module>
    lines = f.readlines()
            ^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 79: invalid continuation byte

0xd5 is a character used by windows, since I changed my computer to mac before, this may have been the cause of the issue.
The soultion is quiet simple though, finding the 396th lines code and modify it as follow:

    with open(fpath, "r", errors="ignore") as f:

This worked for me, and I hope it may be helpful for others who facing the similar issue.

Answer 1 · 2023-08-04T00:24:03.000Z

Hey @xdliyushen thanks for reporting this! I updated the open call to use:

open(fpath, "r", encoding="utf-8", errors="replace")

I thought it would be better for errors to be explicitly displayed by default, by replacing the offending character with a �.

I think utf-8 is probably a safe encoding to assume by default but called it out explicitly here.

Thanks again for the report!

Answer 2 · 2023-08-04T00:24:27.000Z

Fixed in b16e339