rtf_to_text ignores the errors parameter
powo opened this issue · 2 comments
powo commented
stevengj commented
Do you have an example .rtf
file that illustrates your problem?
powo commented
Here is an example:
>>> striprtf.rtf_to_text(r"{\rtf1\ansi\ansicpg0 T\'e4st}", encoding="utf-8", errors="replace")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/powo/Sync/dev/bat/.venv/lib/python3.11/site-packages/striprtf/striprtf.py", line 136, in rtf_to_text
out += bytes.fromhex(hexes).decode(encoding=encoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 0: unexpected end of data
expected behavior would be, that the errors="replace"
will ignore the error and replace the invalid character, like this:
>>> b'T\xe4st'.decode("utf-8", errors="replace")
'T�st'