Newlines incorrectly handled in reverse_readfile on windows
pmrv opened this issue · 2 comments
System
- Monty version: 2022.9.9
- Python version: 3.11
- OS version: windows
Summary
In reverse_readfile the line separator is hard coded as \n
, but since monty opens the file in binary mode python doesn't do the usual newline translation you end up with spurious \r
at the end of lines read by reverse_readfile
. I would think reverse_readlines
suffers from the same problem. I've came across this only on windows, but a similar issue should happen in macOS, where monty doesn't detect any lines in files, since the line separator is just \r
there.
Example code
I don't have a working installation of python+monty on windows, but there's an example output in our CI here.
Suggested solution (if known)
Just guessing, but a simple solution might just be to open the files in text mode or pass the newline
argument to the underlying python functions, since you .decode('utf8')
all strings anyway. I'm not sure if this would interfere with your handling of compressed files. If it does you'd have to replace every occurrence of \n
in the code with os.linesep
.
I'm able to recreate this issue, would fix it today.
from monty.io import reverse_readfile
with open("sample_windows.txt", "w", newline="\r\n") as f:
f.write("\r\n".join(["Line1", "Line2", "Line3"]))
with open("sample_unix_mac.txt", "w", newline="\n") as f:
f.write("\n".join(["Line1", "Line2", "Line3"]))
for filename in ("sample_windows.txt", "sample_unix_mac.txt"):
print(f"Reading file: {filename}")
for line in reverse_readfile(filename):
print(repr(line))
Generates:
Reading file: sample_windows.txt
'Line3'
'Line2\r\r'
'Line1\r\r'
Reading file: sample_unix_mac.txt
'Line3'
'Line2'
'Line1'
The issue indeed exists for reverse_readline
, with:
from monty.io import reverse_readline
with open("sample_windows.txt", "w", newline="\r\n") as f:
f.write("\r\n".join(["Line1", "Line2", "Line3"]))
with open("sample_unix_mac.txt", "w", newline="\n") as f:
f.write("\n".join(["Line1", "Line2", "Line3"]))
for filename in ("sample_windows.txt", "sample_unix_mac.txt"):
print(f"Reading file: {filename}")
with open(filename) as file:
for line in reverse_readline(file):
print(repr(line))
We now have:
Reading file: sample_windows.txt
'Line3'
''
'Line2'
''
'Line1'
Reading file: sample_unix_mac.txt
'Line3'
'Line2'
'Line1'