Let user decide their preferred encoding method
basvdheuvel opened this issue · 1 comments
- pypugjs version: 5.8.1
- Flask version: 1.1.1
- Python version: 3.6.9
- Operating System: Linux 5.2.9-arch1-1-ARCH
Description
If a template file contains a single non-ASCII character (e.g. "ë"), the conversion might contain a wrongly converted character (e.g. "ë"). After doing some research I found that this issue is strongly connected to this PyPugJS issue.
The problem is with the chardet
package. By scanning a file, it makes a guess at the encoding of the file. However, if there is only a single non-ASCII character in the file, a wrong encoding might be detected (confer this and this issue). A solution was proposed to this problem, but it never got accepted and is now out-of-date. The correspondence in that last referenced pull request contains a quick and dirty patch, which resolved the issue for me.
I am, however, not satisfied with this kind of solution. It is not nice for users to have to go through the research I went through to resolve a strange bug as this one. Even mentioning the hotfix in the PyPugJS documentation seems like the wrong way to go. The problem is that this package now forces users to rely on an unreliable package.
My proposal is to change the open
method in pypugjs/runtime.py
introduced in PR #27 to use a global setting which the user can use to force their preferred encoding. The default value would be auto
, which uses chardet
. Other values can be any strings, as long as they are valid names of encodings.
I would love to do this work myself, but I am on a tight deadline for a job, and I might not have time nor urgency to resolve the issue once I'm done with that job. Hopefully somebody else can pick up the slack. Many thanks!
This issue is affecting me on Linux as well, thanks for raising. Looking through the code, I think your proposal is valid.