script error with unicode characters / emojis / accented characters
Opened this issue · 1 comments
the parser will return errors on unicode characters like \u25b2
;`\ufffd' & emojis
Traceback (most recent call last):
File "dynalist_to_markdown.py", line 218, in <module>
exporter.export()
File "dynalist_to_markdown.py", line 27, in export
self.process_files()
File "dynalist_to_markdown.py", line 83, in process_files
self.process_document(path, node, config)
File "dynalist_to_markdown.py", line 95, in process_document
self.document_to_markdown(path, node_by_id, config, out)
File "dynalist_to_markdown.py", line 53, in document_to_markdown
print(f'{indent}- {content}', file=out)
File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u5988' in position 138: character maps to <undefined>
Hi @cannibalox, sorry I just now saw that you filed this issue! I did test with Emojis, but it seems that the behavior depends on the default codec, which is utf8
on my Mac.
I think you should be able to work around this issue by overriding the default codec by setting the PYTHONIOENCODING
environment variable to utf8
. For example, if you are using the Bash shell:
PYTHONIOENCODING=utf8 python dynalist_to_markdown.py
Or, using Windows (where it appears you also need to set PYTHONLEGACYWINDOWSSTDIO
) command prompt:
set PYTHONIOENCODING=utf8
set PYTHONLEGACYWINDOWSSTDIO=1
python dynalist_to_markdown.py
In the meantime I will figure out the best way to fix this in the script itself, so the workaround is unnecessary. (Probably manually encoding the string, but I'm not sure.)