mako.cmd.cmdline() reads template as binary but writes as text resulting in garbled newlines on Windows
ni-balexand opened this issue · 9 comments
The issue here can be demonstrated by running the following script on Windows:
import mako.cmd
import sys
with open("template_file", "w") as f:
print("Line 1", file=f)
print("Line 2", file=f)
print("Line 3", file=f)
sys.argv.append("--output-file")
sys.argv.append("rendered_file")
sys.argv.append("template_file")
assert sys.argv[1:] == ["--output-file", "rendered_file", "template_file"]
# pass sys.argv off to mako
mako.cmd.cmdline()
# awkward \r\r\n EOL sequences is what is written to disk
with open("rendered_file", "rb") as f:
rendered = f.read()
assert rendered == b'Line 1\r\r\nLine 2\r\r\nLine 3\r\r\n'
# If you were to open/read in text mode it shows up as two newlines
with open("rendered_file", "rt") as f:
rendered = f.read()
assert rendered == 'Line 1\n\nLine 2\n\nLine 3\n\n'
It's odd that read_file() in util.py opens / reads in binary mode but then cmdline() in cmd.py opens / writes in text mode. The whole point of using text mode should be to convert EOLs on Windows, but it does not do this correctly.
hi -
why would you not simply fix the file to have proper CR/NL combinations? (edit: OK I thought that was the input file, it's the output)
this part:
assert rendered == b'Line 1\r\r\nLine 2\r\r\nLine 3\r\r\n'
looks wrong?
I'd really like to use Python's text mode here and not go back to a Python 2-ish approach.
I guess that written file is the bug. I'm not really understanding the problem yet. I dont want to write "b" at all.
Mike Bayer has proposed a fix for this issue in the main branch:
switch to textmode in all cases https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/4450
see above gerrit. does not work yet, as we have some super legacy BOM stuff I'd have to figure out and/or just remove.
the "reading bytes" thing is strictly python 2. if it's causing problems, it's out. I want to use the most modern Python features
OK I understand the issue now. here is the fix:
diff --git a/mako/cmd.py b/mako/cmd.py
index 93a6733..deb8b95 100755
--- a/mako/cmd.py
+++ b/mako/cmd.py
@@ -86,12 +86,12 @@ def cmdline(argv=None):
kw = dict(varsplit(var) for var in options.var)
try:
- rendered = template.render(**kw)
+ rendered = template.render_unicode(**kw)
except:
_exit()
else:
if output_file:
- open(output_file, "wt", encoding=output_encoding).write(rendered)
+ open(output_file, "wt", newline="", encoding=output_encoding).write(rendered)
else:
sys.stdout.write(rendered)
we have no need for text mode to convert newlines so we turn it off.
can you confirm this fixes all issues on your end? thanks
Yes, that fixes the issue.
I get the confusion now. Opening as text vs binary has several effects including:
- whether read/write operates with str or bytes types
- any necessary encoding / decoding
- newline conversion
Opening in text mode with newline="" will turn off newline conversion while leaving the character encoding alone, and continuing to use the str type for read/write operations. It looks like render_unicode will return a str type just as render did.
I don't see any specific reason why we would need to change render to render_unicode as far as the newline problem I'm looking at (reverting back to just render still looks fine).
Do you think changing to render_unicode is necessary for other string encoding reasons?
render() returns either bytes or string while render_unicode always returns string.
right it's not strictly related but the way it is is technically wrong. eventually typing will be added here and it will change anyway