golang/go

encoding/gob: excessive memory consumption

dvyukov opened this issue · 15 comments

go version devel +8ea2438 Fri Apr 17 13:44:30 2015 +0300 linux/amd64
with https://go-review.googlesource.com/#/c/8942/ appled.

The following program consumes 2GB while the input is few bytes. I suspect it is possible to modify the input to force the program to allocate arbitrary amount of memory and run for arbitrary period of time.

Allocations happen while receiving a wire type (encoding/gob.(*Decoder).recvType).

package main

import (
    "bytes"
    "encoding/gob"
    "encoding/hex"
)

type X struct {
}

func main() {
    data, _ := hex.DecodeString("53ff8f03010106696e7075745401ff900001fc0701044d61786901040001044d" +
        "696e6901040001044d61787501060001044d61786601080001044d696e660108" +
        "0001044d617863010e0001044d696e63010e00000007ff9001fe0000")
    s := ""
    gob.NewDecoder(bytes.NewReader(data)).Decode(&s)
    var b []byte
    gob.NewDecoder(bytes.NewReader(data)).Decode(&b)
    var f float64
    gob.NewDecoder(bytes.NewReader(data)).Decode(&f)
    var x X
    gob.NewDecoder(bytes.NewReader(data)).Decode(&x)
}

This input is probably a dup, but please test separately.

    data, _ := hex.DecodeString("53ff8f03010106696e7075745401ff900001fc0701044d61786901040001044d" +
        "696e6901040001044d61787501060001044d61786601080001044d696e660108" +
        "0001044d617863010e0001044d696e63010e00000007ff9001fe010000")

@robpike, ping. This is the last one for gob.

No need to ping. The milestone is sufficient.

No need to ping. The milestone is sufficient.

The later this bug is fixed the later in the release cycle the rest of the bugs will be discovered.

I investigated this a bit, but need to stop for now. What I found is that when I turn on encoding/gob/debug.go, and then call Debug(bytes.NewReader("...the hex...")) the variable numFields in debug.go gets set to a too big number, undoubtedly due to the corrupted gob input.

I have not yet worked backwards to find out if there's a way that gob can tell that the number of fields it is being asked to keep track of is too big.

We know input size. If claimed number of entries is N and minimum entry size is S, can't we check that N*S <= DATA_SIZE?

Working as intended, if somewhat unfortunate.

The slices falsely encoded in these data are just under the cutoff for "too big". If they were slightly larger, they would be rejected outright.

In general, we can't compare the data length against the buffer size because the stream may contain type definitions, which mean some of the data will be in a second, as yet unread, message. We could put in a really complicated test for those cases that are guaranteed to be in this message, but since the goal here is hardening, that's pointless: other data types can still trigger the big allocation

The data claims to contain a type definition with 117507149 fields, the type should be followed by 117507149 field definitions, each field definition takes non-zero number of bytes, a 92-byte data can't possibly contain that.

That's not true in general. A slice can be stored in many messages, depending on its contents. Please leave this closed.

@robpike How is it possible that a slice is stored in several messages? The code seems to just read in all slice elements from the provided buffer.

I don't understand why it does not make sense to sanity check claimed data size against real data size. If we have N elements with minimal wire size S, input data must be at least N*S bytes. Please elaborate.

Because the slice may contain interface fields that require a type definition.

Is it answer to the first question (slice stored in several messages) or to the second (checking of data size)?
If slice contains interface fields that require a type definition, what is huge here (slice len, single object, type definition or something else)?
I am not following you.

A slice can in general require as many messages as there are elements. Therefore it is not easy to say, this message is too short for the slice data. As I said above, it's feasible to check for certain slice types but not all, so I don't believe it's worth doing at all since it can't be done properly.

Rob, have you already commented somewhere on the question, "how can I
safely use package x when the input is untrusted and I want to parse an
input without allowing it to consume an arbitrarily giant chunk of
memory"? We ran across this same question w.r.t. decoding maliciously
giant images (issue #5050 IIRC).
On Jun 2, 2015 8:18 PM, "Rob Pike" notifications@github.com wrote:

A slice can in general require as many messages as there are elements.
Therefore it is not easy to say, this message is too short for the slice
data. As I said above, it's feasible to check for certain slice types but
not all, so I don't believe it's worth doing at all since it can't be done
properly.


Reply to this email directly or view it on GitHub
#10490 (comment).

I have not.