Add a tag that suggests setting an encoding when using open
Zeturic opened this issue · 1 comments
Tag Name
!encoding or !utf8
What kind of content should the tag include?
It'll probably be a lot, but I think it would be important to mention at least most of the following:
- What is an encoding and what happens if you try to read a file with the wrong encoding?
- Explaining the "system default encoding" that
open
uses by default and how it's not necessarily the encoding they meant, and thus you should always explicitly setencoding
whenopen
ing a file in text mode. - The ubiquity of UTF-8 among text files, meaning that
encoding="UTF-8"
is almost always the right choice.
It might also be worth a small mention of PEP 597 and the possible future when implicit use of the system default encoding is deprecated (as part of the gradual transition to making UTF-8
the default encoding with open
regardless of the SDE).
As far as "why": a lot of people ask questions about UnicodeDecodeError
s that essentially come down to "my file is UTF-8 but I didn't explicitly set it so my Windows machine is trying to use CP-1252". Or, even worse, cases where their UTF-8 actually happens to be valid CP-1252 and non-ASCII characters get mangled.
Having a tag that not only explains how to fix it (encoding="UTF-8"
), but why it's a bad idea to implicitly rely on the SDE would be very useful.
Thank you for making this suggestion. In order to be successful, tags need to be brief. If the intended beneficiaries of this tag are those who are struggling with a current UnicodeDecodeError, then they are probably interested in resolving that error as quickly as possible and resuming what they were doing, and are probably not interested in learning about file encodings in general at that particular moment. In other words, the tag would need to narrowly focus on resolving the error, in as few words as possible.
Would you be interested in writing a draft? Tags are rendered from markdown, so you can post a draft as a response in this thread, and it will look more-or-less how it would look when posted by the bot.