golang/go

image/gif: gif decoding fails `with too much image data`

odeke-em opened this issue · 15 comments

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
    go version devel +1f44643 Wed Jun 22 00:12:55 2016 +0000 darwin/amd64
  2. What operating system and processor architecture are you using (go env)?
    GOARCH="amd64"
    GOBIN="/Users/emmanuelodeke/go/bin"
    GOEXE=""
    GOHOSTARCH="amd64"
    GOHOSTOS="darwin"
    GOOS="darwin"
    GOPATH="/Users/emmanuelodeke/go"
    GORACE=""
    GOROOT="/Users/emmanuelodeke/go/src/go.googlesource.com/go"
    GOTOOLDIR="/Users/emmanuelodeke/go/src/go.googlesource.com/go/pkg/tool/darwin_amd64"
    CC="clang"
    GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/kn/s30g1srd4h50bh6sd322tppm0000gn/T/go-build038095302=/tmp/go-build -gno-record-gcc-switches -fno-common"
    CXX="clang++"
    CGO_ENABLED="1"
  3. What did you do?
    Run program at https://play.golang.org/p/NqqvAkiIao with source of http://ualberta.ca/~odeke/tx.gif
$ go run NqqvAkiIao.go --source http://ualberta.ca/~odeke/tx.gif

If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.

  1. What did you expect to see?
    Successful decoding of a gif with no errors
  2. What did you see instead?
    gif: too much image data

/cc @montanaflynn who reported this issue first.

Code inlined

/*
  After running this gif
      https://67.media.tumblr.com//b02659b27081314a412215f4eb31dacf/tumblr_nu2x4whJgy1udf6d3o1_400.gif
  through ffmpeg, Go fails to decode the gif with error:
      `gif: too much image data`

  Command:
  $ ffmpeg -i https://67.media.tumblr.com//b02659b27081314a412215f4eb31dacf/tumblr_nu2x4whJgy1udf6d3o1_400.gif outf.gif
  $ go run main.go --source outf.gif
  - Repro URL
  http://ualberta.ca/~odeke/tx.gif

  Interestingly:
  - The outfile is 1.0MB.
  - The original file is 1.4MB.
*/
package main

import (
    "flag"
    "image/gif"
    "log"
    "net/http"
    "path/filepath"
    "strings"
)

func localAndNetBasedClient() *http.Client {
    transport := new(http.Transport)
    transport.RegisterProtocol("file", http.NewFileTransport(http.Dir("/")))
    client := new(http.Client)
    client.Transport = transport
    return client
}

var httpPrefixes = []string{"http", "https"}

func ensureFitForFetch(source string) string {
    for _, prefix := range httpPrefixes {
        if strings.HasPrefix(source, prefix) {
            return source
        }
    }

    absSource, err := filepath.Abs(source)
    if err == nil {
        source = absSource
    }
    return strings.Join([]string{"file://", source}, "/")
}

func main() {
    var source string
    flag.StringVar(&source, "source",
        "https://67.media.tumblr.com//b02659b27081314a412215f4eb31dacf/tumblr_nu2x4whJgy1udf6d3o1_400.gif",
        "the URI of the gif you want to ensure properly decodes in Go")
    flag.Parse()

    client := localAndNetBasedClient()
    source = ensureFitForFetch(source)
    res, err := client.Get(source)
    if err != nil {
        log.Fatal(err)
    }

    defer res.Body.Close()

    if _, err := gif.DecodeAll(res.Body); err != nil {
        log.Fatal(err)
    }
}

CC @nigeltao

This happens with every Go release since 1.1, so it is not a new problem.

With 1.0.3 I get gif: extra data after image.

vsiv commented

Any updates on this, we process thousands of images daily, and run into this everyday, albeit on a very small number of them.

Thanks.

No update, but it would be helpful if you could attach a small (in terms of file size) GIF that exhibits this problem. The original post linked to something weighing 1 MB but that's still kind of unwieldy to debug. The smallest 'bad' GIF you've got would be great.

vsiv commented

Most of our images come from giphy.com where, a typical GIF is > 1MB, so I havent been able to find one thats smaller - will post if I find one.

There is one that's quite wonky - its smaller but fails with a different message -
http://a3.fanbread.com/uploads/listicle/featured_image/65227/extra_large_cropped_wino.gif
fails with gif: frame bounds larger than image bounds

@nigeltao here is one that's under 500kb and less than 15 frames: https://j.gifs.com/Z6KoKJ.gif

It seems much more prominent on larger gifs though.

horgh commented

I've debugged this. I have an explanation and a recommendation.

The problem gifs in question (tx.gif from @odeke-em and Z6KoKJ.gif from @montanaflynn) indeed have extra/too much data. In both cases, the extra data is a data sub-block in the image data with size 1 containing 0x00. In both cases, there is an LZW End of Information Code, then this extra sub-block, and then the Block Terminator.

In tx.gif this happens in the second image in the gif. In Z6KoKJ.gif it happens in the 11th image in the gif.

In well formed gifs, the last sub-block would not be present. It does not need to be there since we completed decompression.

Currently the decoder fully reads/decompresses the image's data, and then uses the block reader to check whether there is more. It sees there is more data (in this case, there is a block of 1 byte found) and raises this error.

If we don't decompress the data and simply look at the image data as a series of data sub-blocks, it is not extra. That is, the data sub-blocks are well formed. It is extra only when we take into account reaching the LZW End of Information code.

The question is what we should be doing. Other decoders (GIFLIB, browsers) apparently accept this. As far as the specification goes, it does not say one way or the other. In Appendix F it says "the code that terminates the LZW compressed data must appear before Block Terminator". It does not state what to do if there is data after the LZW data ends, nor that the code must be the last piece of data. Section 22 states that "[t]he sequence of indices is encoded using the LZW Algorithm [...]" so I think this data should not be treated as an index. But it is not useful and presumably is technically invalid.

I looked at what the most recent version of GIFLIB (5.1.4) does. After it fully decompresses an image, it silently reads and discards any extra data such as this. It stops when it reads the Block Terminator. You can see this in its DGifGetLine() function (the do while loop). I think doing this would work for Go's decoder too. Once we read and decompress the image, any extra data in its sub-blocks can be discarded. There should not be any, but if there are, ignore them.

As for where these problem gifs are coming from, given the transformation using ffmpeg produced one with the defect, its gif encoder may have a bug!

Note I didn't analyze @vsiv's extra_large_cropped_wino.gif. It appears corrupted to me in Firefox and is showing a different error, so possibly it is either truly corrupted or a different issue.

I wrote a patch with a possible solution. I realize this will need to be submitted differently, but I thought I would include it and get feedback here before doing so. https://github.com/horgh/go/commit/ab144b8f336c0b99eec9b416a7577a305190dbdc

Thanks for any feedback!

Hi @horgh, nice debugging work. I suggest you submit a CL. We cannot review code without a signed CLA, so a Gerrit CL is the only format for discussing patches.

@horgh, great debugging.

@nigeltao, thoughts?

vsiv commented

@horgh - thanks for the debugging!
All - In an ideal world, if chrome/safari can handle a gif without user errors, golang accomodates it as well - but i understand this can be a slippery slope, because as u pointed out, some of these gif encoders have bugs in them, and many being used in existence are older versions of those libs, so even though the lib fixes the issue, new gif's appear on the web with the buggy encoding. This is similar to forgiving html authoring errors, that browsers accomodate.

Another option is to work out an external gif decoder lib here on github for concerning users to use if needed and leave the stricter lib as part of the golang distribution. thanks.

CL https://golang.org/cl/37258 mentions this issue.

horgh commented

Thanks all! I've uploaded the possible patch to Gerrit now too: https://go-review.googlesource.com/37258 (oh I see @gopherbot beat me!)

Could it be limited somehow, how much stray data is allowed? Otherwise it feels like a new attack vector. What do other gif libraries do to limit it?

horgh commented

Good point @nightlyone.

Other gif libraries:

  • GIFLIB has no limit on how much extra data it reads (see the function I referenced in my first comment)
  • ffmpeg's decoder also has no limit as best I can tell. See its gif_read_image() call to ff_lzw_decode_tail() which states "read the garbage data until end marker is found". The while loop in the latter function reads and skips bytes until it reaches a block terminator.

That is not to say we need to use that same approach here. In the two samples mentioned in this issue, only a single byte is extra. We could be conservative and allow as little as 1 byte or some arbitrary small limit.

@horgh great debugging indeed.

I'd like to limit the slack to at most one extra byte. We can continue the conversation on the code review.

Change https://golang.org/cl/68350 mentions this issue: image/gif: make blockReader a ByteReader, harden tests