Unicode en dash (u"\u2013") Is Not Replaced By sanitize_filename
kenlerner opened this issue · 3 comments
When running the following:
sanitized = sanitize_filename(txt, platform="Windows")
If the variable txt contains a unicode dash an invalid sanitized filename is returned. The unicode dash is not replaced. An error occurs when a filename is opened using the sanitized filename.
The following change works:
sanitized = sanitize_filename(re.sub(u"\u2013", "-", txt), platform="Windows")
I think the function should remove the unicode en dash and replace it with an ascii dash.
@kenlerner
Thank you for your feedback.
Could I ask what made you think Unicode dash is an invalid character for a filename?
Unicode normalization (NFC, NFKC, NFD, NFKD) would leave Unicode dashes as it is.
I understand that Unicode dashes are confusing for file names, but still, that is a valid character for file names.
Python created an exception when trying to create a file when the filename had a unicode dash in it. Error was same as reported here:
https://stackoverflow.com/questions/55867822/when-running-python-script-i-get-%C3%A2%E2%82%AC-instead-of-a-hyphen
I can create files that name includes an unicode dash by Python.
If that exception happens only at a specific Python version, please upgrade Python or report the problem to the official Python team.
And the topic at the link does not seem to be a filename problem, just that they have mixed used ASCII-dash and Unicode-dash as dictionary keys.