looping over lines of a file object does not translate correctly

Question

looping over lines of a file object does not translate correctly

justinfx opened this issue 3 years ago · 11 comments

Hi there! Cool project. I gave it a play with some simple example code and noticed that looping over the lines of an open file object does not translate properly yet, so I thought I would leave this bug report here.

def main():
	fh = open("file.txt")
	for line in fh:
		print(lin)
	fh.close()
  
	with open("file2.txt") as fh2:
		for line in fh2:
			print(line)

if __name__ == '__main__':
  main()

Currently translates to:

package main

import "fmt"

func main() {
	fh := open("file.txt")
	for _, line := range fh {
		fmt.Println(lin)
	}
	fh.close()
	func() {
		fh2 := open("file2.txt")
		defer func() {
			if err := fh2.Close(); err != nil {
				panic(err)
			}
		}()
		for _, line := range fh2 {
			fmt.Println(line)
		}
	}()
}

Answer 1 · 2021-06-28T04:19:27.000Z

Thanks for the bug report. I actually didn't know you could do that -- I've always just used .readlines()!

Answer 2 · 2021-06-28T04:28:17.000Z

Ah yep. readlines() isn't memory efficient since it slurps in the whole file, into a list. If you loop over the file object it uses a generator to yield each line at a time. Probably maps well to a Go Scanner with lines?

Answer 3 · 2021-06-28T05:43:41.000Z

Easiest initial implementation looks like this (I needed something that would swap in 1:1 for a range loop, conjured up this):

package main

import (
	"bufio"
	"fmt"
	"os"
)

func main() {
	fh := func() *os.File {
		f, err := os.OpenFile("file.txt", os.O_RDONLY, 0o777)
		if err != nil {
			panic(err)
		}
		return f
	}()
	if sc := bufio.NewScanner(fh); sc.Scan() {
		for line, more, done := sc.Text(), sc.Scan(), false; !done; line, more, done = sc.Text(), sc.Scan(), !more {
			fmt.Println(line)
		}
	}
	if err := fh.Close(); err != nil {
		panic(err)
	}
	func() {
		fh2 := func() *os.File {
			f, err := os.OpenFile("file2.txt", os.O_RDONLY, 0o777)
			if err != nil {
				panic(err)
			}
			return f
		}()
		defer func() {
			if err := fh2.Close(); err != nil {
				panic(err)
			}
		}()
		if sc := bufio.NewScanner(fh2); sc.Scan() {
			for line, more, done := sc.Text(), sc.Scan(), false; !done; line, more, done = sc.Text(), sc.Scan(), !more {
				fmt.Println(line)
			}
		}
	}()
}

Any major issues?

Answer 4 · 2021-06-28T06:38:34.000Z

Really cool! My only comment is whether the scanner needs a final error check after the loop to make sure it wasn't a non EOF error?

Answer 5 · 2021-06-28T06:40:40.000Z

~~Any idea how I can trigger an EOF for testing?~~ Nevermind lol

EDIT: Oh, I just realized, Python preserves line endings where this implementation does not. Hrm....

Would an EOF check also need to go in the else block?

	if sc := bufio.NewScanner(fh); sc.Scan() {
		for line, more, done := sc.Bytes(), sc.Scan(), false; !done; line, more, done = sc.Bytes(), sc.Scan(), !more {
			fmt.Println(line)
		}
		if err := sc.Err(); err != nil {
			panic(err)
		}
	} else if err := sc.Err(); err != nil {
		panic(err)
	}

I came up with this which looks better although it means I have to replace one statement with three which I don't loveee doing...

	sc := bufio.NewScanner(fh);
	for more, line := sc.Scan(), sc.Text(); more; more, line = sc.Scan(), sc.Text() {
		fmt.Println(line)
	}
	if err := sc.Err(); err != nil {
		panic(err)
	}

Maybe this

	if sc := bufio.NewScanner(fh); true {
		for more, line := sc.Scan(), sc.Text(); more; more, line = sc.Scan(), sc.Text() {
			fmt.Println(line)
		}
		if err := sc.Err(); err != nil {
			panic(err)
		}
	}

still need to figure out how to preserve line endings

Answer 6 · 2021-06-28T07:34:46.000Z

I'm not used to seeing so much logic crammed into a smaller space when using a scanner. Usually I just create the scanner and then for loop while Scan returns true.

That aside, the newline thing is an extra bit of trickiness huh? I've solved that in the past, if I remember right, with a custom Scan split function which I think would let you capture the newline into some state so you can preserve it.
A second idea would be to wrap the file in an extra buffer so that maybe you can check the position after each scan?

Answer 7 · 2021-06-28T07:47:41.000Z

I think I'm willing to just let it be slightly different from Python for the time being until someone can show me a clean implementation that perfectly encapsulates Python's behavior while not looking like a complete trainwreck in Go. 99% of the time when I get newlines in Python I'm stripping them off anyway, not sure about other people's use cases.

	if sc := bufio.NewScanner(fh); true {
		for sc.Scan() {
			line := sc.Text() + "\n"
			fmt.Println(line)
		}
		if err := sc.Err(); err != nil {
			panic(err)
		}
	}

Think this works for now? I don't love inserting new statements inside people's loops but I think it's the right thing to do here as long as it's contained somehow (the outer statement is still one statement -- for becomes an if)

This implementation includes some gotchas like adding on new lines that don't really exist and removing carriage returns. But, the alternatives seem really verbose for the average use case...

Eventually, I'd like to incorporate settings into this program. The behavior here is probably a good candidate for a setting. A python-like implementation seems like it'd need a very large custom splitter-- I could throw that down at the bottom of the program but it's not pretty and usually not necessary: https://stackoverflow.com/questions/41433422/read-lines-from-a-file-with-variable-line-endings-in-go

Answer 8 · 2021-06-28T10:32:21.000Z

I feel like if you are transpiling a python program, then you might have to support retaining the original behaviour of the line endings. What if the python program expects to process those but then the Go translation becomes incorrect? I agree that it's a pain that it is different.

line := sc.Text() + "\n"
fmt.Println(line)

Is that a double new line? Maybe you want fmt.Println without adding your own line ending? Or maybe you want fmt.Print with your own line ending added?

You might be able to generate a small wrapper around the existing ScanLine split function that looks at the line ending? I'm over simplifying and could probably offer an example once I am back to my workstation. Maybe it can't be avoided that some private wrapper functions might have to be generated in order to keep the main code readable and matching the original python behaviour?

Answer 9 · 2021-06-28T12:34:20.000Z

Is that a double new line?

Yes, this is consistent with Python's behavior in the original program (except for on the last line if it is missing a line ending). print() in Python is println. I plan on adding end= support and having a special case for end='' to give you printf

EDIT:

	if sc, line, err := bufio.NewReader(fh), "", *new(error); true {
		for {
			line, err = sc.ReadString('\n')
			if err != nil && (err == io.EOF && line == "" || err != io.EOF) {
				break
			}
			fmt.Println(line)  # Body goes here
		}
		if err != io.EOF {
			panic(err)
		}
	}

How about this?

Not too verbose
Preserves line endings
Single statement
Should handle ErrUnexpectedEOF
Should handle when buffer size is over 64K
Works well with continue and break statements added later in the body (because it avoids appending anything to the end of the loop body)
Errors immediately by breaking into an error check if a non-EOF error is encountered.
Does not execute the body for empty lines after an EOF is reached

https://stackoverflow.com/questions/8757389/reading-a-file-line-by-line-in-go

Unrelated but in trying to solve this problem I found https://github.com/segmentio/golines -- might add this as a postprocessing step :P

Answer 10 · 2021-06-28T18:10:00.000Z

Oh man. Sorry about the Println suggestion. Ignore me!

That new version of the line reading definitely looks like it covers all the issues. NIce one. Does it know not to panic if the original python code used exception handling?

Re:

if sc, line, err := bufio.NewReader(fh), "", *new(error); true { ... }

Is this idiom being used to ensure the generated code is using smaller scoping for the variables to avoid potential conflicts?

Answer 11 · 2021-06-28T20:38:22.000Z

Does it know not to panic if the original python code used exception handling?

try/except is obviously tricky in Go -- I do have a toy implementation of it which users recover (CTRL+F for "except" in the README) -- and I'll add a line to inform it to check for ErrUnexpectedEOF when there's an except for EOFError for now -- but, obviously this is far from how you'd want it done in Golang.

Trying to perform more specialized implementations of try/except handling actual error objects would be very nice although not easy... I'm open to hearing some ideas about how that might generalize. Maybe creating an anonymous error handling function with the except block and applying it to every error inside of the body with a panic? But then any functions would need to return errors instead of panicking which isn't really a pattern you see in Python...

Is this idiom being used to ensure the generated code is using smaller scoping for the variables to avoid potential conflicts?

Yup. The code for handling scope in this library is very rough -- the scope rules in Python and Go are super different and that is something I haven't tackled head-on yet. I've considered doing a complete overhaul of this by augmenting the AST with the scope of all variables using something like https://github.com/kavigupta/ast_scope/ (I've checked it out -- it's good but incompatible license) or https://github.com/PyCQA/astroid (haven't checked it out but it's actively maintained and has a compatible license).

So, using hacks like this help get around complications that might arise with scope (especially when dealing with nesting, I do very little variable renaming presently) while keeping the code close to 1:1 with the original (a statement for a statement, an expression for an expression).