mbirk/dynalist_to_markdown

script error with unicode characters / emojis / accented characters

Opened this issue · 1 comments

the parser will return errors on unicode characters like \u25b2;`\ufffd' & emojis

Traceback (most recent call last):
  File "dynalist_to_markdown.py", line 218, in <module>
    exporter.export()
  File "dynalist_to_markdown.py", line 27, in export
    self.process_files()
  File "dynalist_to_markdown.py", line 83, in process_files
    self.process_document(path, node, config)
  File "dynalist_to_markdown.py", line 95, in process_document
    self.document_to_markdown(path, node_by_id, config, out)
  File "dynalist_to_markdown.py", line 53, in document_to_markdown
    print(f'{indent}- {content}', file=out)
  File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u5988' in position 138: character maps to <undefined>
mbirk commented

Hi @cannibalox, sorry I just now saw that you filed this issue! I did test with Emojis, but it seems that the behavior depends on the default codec, which is utf8 on my Mac.

I think you should be able to work around this issue by overriding the default codec by setting the PYTHONIOENCODING environment variable to utf8. For example, if you are using the Bash shell:

PYTHONIOENCODING=utf8 python dynalist_to_markdown.py

Or, using Windows (where it appears you also need to set PYTHONLEGACYWINDOWSSTDIO) command prompt:

set PYTHONIOENCODING=utf8
set PYTHONLEGACYWINDOWSSTDIO=1
python dynalist_to_markdown.py

In the meantime I will figure out the best way to fix this in the script itself, so the workaround is unnecessary. (Probably manually encoding the string, but I'm not sure.)