golang/go

compress/lzw: compress/decompress corrupts data

dvyukov opened this issue · 5 comments

The following program fails with the panic:

package main

import (
    "bytes"
    "compress/lzw"
    "fmt"
    "io/ioutil"
)

func main() {
    uncomp := []byte("a")
    buf := new(bytes.Buffer)
    w := lzw.NewWriter(buf, lzw.LSB, 2)
    _, err := w.Write(uncomp)
    if err != nil {
        panic(err)
    }
    if err := w.Close(); err != nil {
        panic(err)
    }
    r1 := lzw.NewReader(buf, lzw.LSB, 2)
    uncomp1, err := ioutil.ReadAll(r1)
    if err != nil {
        panic(err)
    }
    if !bytes.Equal(uncomp, uncomp1) {
        fmt.Printf("data0: %q\n", uncomp)
        fmt.Printf("data0: %q\n", uncomp1)
        panic("data differs")
    }
}
data0: "a"
data0: "\x01"
panic: data differs

go version devel +b0532a9 Mon Jun 8 05:13:15 2015 +0000 linux/amd64

Is it because of width?
Experiments show that width is the number of bits encoded from every byte.

dsnet commented

I don't know too much about lzw, but comments say that the litWidth value controls the "number of bits to use for literal codes". Thus, if the value is set to 2, doesn't that mean you can only encode the literals 0x00, 0x01, 0x02, and 0x03?

In fact, this seems to be what's happening since the above code works when uncomp is set to \x00, \x01, \x02, or \x03. It also seems that the incorrect output value is the input value modulo 4.

If the encoder/decoder is working properly, maybe Write should output an error if the user tries to encode data with literals that are too large? In the horrendous off-chance that other formats depend on this degenerate behavior, then the library should at least document it?

@dsnet Yes, this is my current understanding that there is no bug in the code.
I don't know whether it worth a runtime check or not, maybe it is meant to be obvious for anybody using the package. However, the docs are quite cryptic ("number of bits to use for literal codes"). When I read it first time, I interpreted it as some parameter of compression algorithm.

CL https://golang.org/cl/11063 mentions this issue.

CL https://golang.org/cl/11227 mentions this issue.