golang/go

x/tools/txtar: Parse does not cope with CR-LF line endings

rogpeppe opened this issue · 5 comments

What version of Go are you using (go version)?

$ go version
go version devel go1.21-a4d5fbc3a4 Thu Feb 9 21:14:00 2023 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/rogpeppe/.cache/go-build"
GOENV="/home/rogpeppe/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/rogpeppe/src/go/pkg/mod"
GONOPROXY="github.com/cue-unity"
GONOSUMDB="github.com/cue-unity"
GOOS="linux"
GOPATH="/home/rogpeppe/src/go"
GOPRIVATE="github.com/cue-unity"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/rogpeppe/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/rogpeppe/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="devel go1.21-a4d5fbc3a4 Thu Feb 9 21:14:00 2023 +0000"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1790663412=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Ran this testscript test:

exec go mod tidy
exec go run main.go
cmp stdout expect-stdout
-- go.mod --
module m

go 1.20

require golang.org/x/tools v0.7.0
-- main.go --

package main

import (
	"fmt"

	"golang.org/x/tools/txtar"
)

func main() {
	ar := txtar.Parse([]byte("comment\r\n-- file --\r\ndata\r\n"))
	fmt.Printf("comment: %q\n", ar.Comment)
	for _, f := range ar.Files {
		fmt.Printf("file %q: %q\n", f.Name, f.Data)
	}
}
-- expect-stdout --
comment: "comment\r\n"
file: "file": "data\r\n"

What did you expect to see?

A passing test.

What did you see instead?

> exec go mod tidy
> exec go run main.go
[stdout]
comment: "comment\r\n-- file --\r\ndata\r\n"
> cmp stdout expect-stdout
--- stdout
+++ expect-stdout
@@ -1,1 +0,0 @@
-comment: "comment\r\n-- file --\r\ndata\r\n"
@@ -0,0 +1,2 @@
+comment: "comment\r\n"
+file: "file": "data\r\n"

FAIL: /tmp/testscript86563551/x.txtar/script.txtar:3: stdout and expect-stdout differ

The logic inside the txtar package does not properly recognise CR-LF files.
It's an open question whether the logic should normalize \r\n to \n.

The logic inside the txtar package does not properly recognise CR-LF files.

Agreed, that should probably be fixed.

It's an open question whether the logic should normalize \r\n to \n.

I would say that it should not.

@bcmills I would like to get on with it

Should parsed structure be formated back to string using \n?
example

Archive{
                                Comment: []byte("comment1\r\ncomment2\r\n"),
                                Files: []File{
                                        {"file1", []byte("File 1 text.\r\n-- foo ---\r\nMore file 1 text.\r\n")},
                                        {"file 2", []byte("File 2 text.\r\n")},
                                        {"empty", []byte{}},
                                        {"noNL", []byte("hello world")},
                                },
}

should be formated to

"comment1\r\n" +
                                "comment2\r\n" +
                                "-- file1 --\n" +
                                "File 1 text.\r\n" +
                                "-- foo ---\r\n" +
                                "More file 1 text.\r\n" +
                                "-- file 2 --\n" +
                                "File 2 text.\r\n" +
                                "-- empty --\n" +
                                "-- noNL --\n" +
                                "hello world\n"

or

"comment1\n" +
                                "comment2\n" +
                                "-- file1 --\n" +
                                "File 1 text.\n" +
                                "-- foo ---\n" +
                                "More file 1 text.\n" +
                                "-- file 2 --\n" +
                                "File 2 text.\n" +
                                "-- empty --\n" +
                                "-- noNL --\n" +
                                "hello world\n"

or even version with only \r\n but that would require a prior scan of data stored in the structure

Good question. Perhaps add a UseCRLF field to the Archive struct, analogous to the same field on the csv.Writer struct?

Change https://go.dev/cl/483335 mentions this issue: txtar: add CRLF handling