materialsvirtuallab/monty

Newlines incorrectly handled in reverse_readfile on windows

pmrv opened this issue · 2 comments

pmrv commented

System

  • Monty version: 2022.9.9
  • Python version: 3.11
  • OS version: windows

Summary

In reverse_readfile the line separator is hard coded as \n, but since monty opens the file in binary mode python doesn't do the usual newline translation you end up with spurious \r at the end of lines read by reverse_readfile. I would think reverse_readlines suffers from the same problem. I've came across this only on windows, but a similar issue should happen in macOS, where monty doesn't detect any lines in files, since the line separator is just \r there.

Example code

I don't have a working installation of python+monty on windows, but there's an example output in our CI here.

Suggested solution (if known)

Just guessing, but a simple solution might just be to open the files in text mode or pass the newline argument to the underlying python functions, since you .decode('utf8') all strings anyway. I'm not sure if this would interfere with your handling of compressed files. If it does you'd have to replace every occurrence of \n in the code with os.linesep.

I'm able to recreate this issue, would fix it today.

from monty.io import reverse_readfile


with open("sample_windows.txt", "w", newline="\r\n") as f:
    f.write("\r\n".join(["Line1", "Line2", "Line3"]))

with open("sample_unix_mac.txt", "w", newline="\n") as f:
    f.write("\n".join(["Line1", "Line2", "Line3"]))

for filename in ("sample_windows.txt", "sample_unix_mac.txt"):
    print(f"Reading file: {filename}")
    for line in reverse_readfile(filename):
        print(repr(line))

Generates:

Reading file: sample_windows.txt
'Line3'
'Line2\r\r'
'Line1\r\r'
Reading file: sample_unix_mac.txt
'Line3'
'Line2'
'Line1'

The issue indeed exists for reverse_readline, with:

from monty.io import reverse_readline


with open("sample_windows.txt", "w", newline="\r\n") as f:
    f.write("\r\n".join(["Line1", "Line2", "Line3"]))

with open("sample_unix_mac.txt", "w", newline="\n") as f:
    f.write("\n".join(["Line1", "Line2", "Line3"]))

for filename in ("sample_windows.txt", "sample_unix_mac.txt"):
    print(f"Reading file: {filename}")
    with open(filename) as file:
        for line in reverse_readline(file):
            print(repr(line))

We now have:

Reading file: sample_windows.txt
'Line3'
''
'Line2'
''
'Line1'
Reading file: sample_unix_mac.txt
'Line3'
'Line2'
'Line1'