proposal: compress/gzip: add distinct error for trailing garbage data
rgmz opened this issue · 3 comments
Proposal Details
Context
When using the compress/gzip
package to decompress gzip files, receiving a gzip: invalid header
error can indicate two distinct possibilities.
-
The file is not a valid gzip file.
Example test case (click to expand)
import ( "compress/gzip" "io" "os" "testing" ) func TestNotAValidGzipFile(t *testing.T) { f, err := os.Open("/tmp/python-3.12.3-amd64.exe") // Arbitrary example, any non-gzip file will suffice. if err != nil { t.Fatal(err) } defer f.Close() rc, err := gzip.NewReader(f) if err != nil { t.Fatal(err) } defer rc.Close() _, err = io.ReadAll(rc) if err != nil { t.Fatal(err) } } // === RUN TestNotAValidGzipFile // gzip_test.go:19: gzip: invalid header
-
The file is a valid gzip file, but contains invalid trailing data.
Example test case (click to expand)
func TestValidGzipFileWithTrailingData(t *testing.T) { // Reproducer file. There are many examples of this. // https://github.com/udacity/self-driving-car-sim/blob/4b1f739ebda9ed4920fe895ee3677bd4ccb79218/Assets/Standard%20Assets/Environment/SpeedTree/Conifer/Conifer_Desktop.spm f, err := os.Open("/tmp/Conifer_Desktop.spm") if err != nil { t.Fatal(err) } defer f.Close() rc, err := gzip.NewReader(f) if err != nil { t.Fatal(err) } defer rc.Close() _, err = io.ReadAll(rc) if err != nil { t.Fatal(err) } } // === RUN TestValidGzipFileWithTrailingData // gzip_test.go:19: gzip: invalid header
Scenario 2 can be especially confusing because Go's implementation of compress/gzip
rejects invalid trailing data, while many popular applications and languages do not. Hence, the ambiguity of this error can lead people to believe that Go is rejecting a valid gzip file.
$ file -i Conifer_Desktop.spm
Conifer_Desktop.spm: application/gzip; charset=binary
$ gzip -S .spm -d /tmp/Conifer_Desktop.spm
gzip: stdin: decompression OK, trailing garbage ignored.
Side-note: #47809 (comment) explains why this choice was made and how to handle invalid trailing data using gzip.Reader.Multistream
.
Proposal Details
I propose adding a distinct error for when subsequent inputs in a data stream don't have valid headers (i.e., trailing garbage). This would provide clarity for users who many not realize that they need to disable gzip.Reader.Multistream
. Additionally, it could be used to programmatically call Multistream(false)
for specific files with trailing garbage.
I will leave the specific error message and behave for future discussion, should there be interest in implementing this proposal.
Caveat: I am not familiar with the implementation of gzip.Reader
. This change may not be possible or desirable due to technical limitations.
Similar Issues
- compress/gzip: ignore trailing garbage data #47809
- compress/gzip: return different error for trailing garbage #61797
- compress/gzip: multistream reading fails on os.File #30230
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Similar Issues
https://github.com/golang/go/issues/47809 https://github.com/golang/go/issues/61797
Drats, I spent quite a bit of time searching existing issues. Not sure how I missed the second one.