mako.cmd.cmdline() reads template as binary but writes as text resulting in garbled newlines on Windows

Question

mako.cmd.cmdline() reads template as binary but writes as text resulting in garbled newlines on Windows

ni-balexand opened this issue 2 years ago · 9 comments

The issue here can be demonstrated by running the following script on Windows:

import mako.cmd
import sys

with open("template_file", "w") as f:
    print("Line 1", file=f)
    print("Line 2", file=f)
    print("Line 3", file=f)

sys.argv.append("--output-file")
sys.argv.append("rendered_file")
sys.argv.append("template_file")

assert sys.argv[1:] == ["--output-file", "rendered_file", "template_file"]

# pass sys.argv off to mako
mako.cmd.cmdline()

# awkward \r\r\n EOL sequences is what is written to disk
with open("rendered_file", "rb") as f:
    rendered = f.read()
assert rendered == b'Line 1\r\r\nLine 2\r\r\nLine 3\r\r\n'

# If you were to open/read in text mode it shows up as two newlines
with open("rendered_file", "rt") as f:
    rendered = f.read()
assert rendered == 'Line 1\n\nLine 2\n\nLine 3\n\n'

It's odd that read_file() in util.py opens / reads in binary mode but then cmdline() in cmd.py opens / writes in text mode. The whole point of using text mode should be to convert EOLs on Windows, but it does not do this correctly.

Answer 1 · 2023-02-15T18:49:19.000Z

hi -

why would you not simply fix the file to have proper CR/NL combinations? (edit: OK I thought that was the input file, it's the output)

this part:

 assert rendered == b'Line 1\r\r\nLine 2\r\r\nLine 3\r\r\n'

looks wrong?

I'd really like to use Python's text mode here and not go back to a Python 2-ish approach.

Answer 2 · 2023-02-15T18:51:12.000Z

I guess that written file is the bug. I'm not really understanding the problem yet. I dont want to write "b" at all.

Answer 3 · 2023-02-15T20:56:42.000Z

Mike Bayer has proposed a fix for this issue in the main branch:

switch to textmode in all cases https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/4450

Answer 4 · 2023-02-15T20:57:20.000Z

see above gerrit. does not work yet, as we have some super legacy BOM stuff I'd have to figure out and/or just remove.

the "reading bytes" thing is strictly python 2. if it's causing problems, it's out. I want to use the most modern Python features

Answer 5 · 2023-02-15T21:48:40.000Z

OK I understand the issue now. here is the fix:

diff --git a/mako/cmd.py b/mako/cmd.py
index 93a6733..deb8b95 100755
--- a/mako/cmd.py
+++ b/mako/cmd.py
@@ -86,12 +86,12 @@ def cmdline(argv=None):
 
     kw = dict(varsplit(var) for var in options.var)
     try:
-        rendered = template.render(**kw)
+        rendered = template.render_unicode(**kw)
     except:
         _exit()
     else:
         if output_file:
-            open(output_file, "wt", encoding=output_encoding).write(rendered)
+            open(output_file, "wt", newline="", encoding=output_encoding).write(rendered)
         else:
             sys.stdout.write(rendered)

we have no need for text mode to convert newlines so we turn it off.

can you confirm this fixes all issues on your end? thanks

Answer 6 · 2023-02-15T23:31:00.000Z

Yes, that fixes the issue.

I get the confusion now. Opening as text vs binary has several effects including:

whether read/write operates with str or bytes types
any necessary encoding / decoding
newline conversion

Opening in text mode with newline="" will turn off newline conversion while leaving the character encoding alone, and continuing to use the str type for read/write operations. It looks like render_unicode will return a str type just as render did.

Answer 7 · 2023-02-15T23:35:58.000Z

I don't see any specific reason why we would need to change render to render_unicode as far as the newline problem I'm looking at (reverting back to just render still looks fine).

Do you think changing to render_unicode is necessary for other string encoding reasons?

Answer 8 · 2023-02-15T23:36:08.000Z

render() returns either bytes or string while render_unicode always returns string.

Answer 9 · 2023-02-15T23:36:36.000Z

right it's not strictly related but the way it is is technically wrong. eventually typing will be added here and it will change anyway