lepture/mistune

CLI mistune on Windows defaults to cp1251 encoding on Russian locale, produces error

Opened this issue · 1 comments

Apparently, there's a bug in the Windows + Python combo on certain locales which results in a wonky encoding choices by the interpreter.

In my case, when I ran python -m mistune ... on a file that contained the character ★ (U+2605 "Black Star"), it produced an error along the lines of:

[lines omitted for clarity]
   File "encodings\cp1251.py", line [xxx], in encode
UnicodeEncodeError: 'charmap' codec can't encode characters in position [yyy-zzz]: character maps to <undefined>

My system language is Russian, so its choice of encoding is understandable to a degree. This appears to be a similar problem, and applying that along with this advice fixed it for me.

Maybe there's something to be adjusted in mistune to force it (?) to use UTF-8?

  • Windows 10
  • Python 3.12.2
  • mistune 3.0.2

@JinEnMok A pull request is welcome.