google/gofuzz

NewFromGoFuzz doesn't work well with go-fuzz

stevenjohnstone opened this issue · 1 comments

I've come across projects using go-fuzz with NewFromGoFuzz where fuzzing didn't catch obvious bugs. I've distilled these down into very simple, unstructured cases (which you'd probably not use gofuzz for) which demonstrate the problems. In the projects in questions, I've used JSON and protobuf unmarshalling of go-fuzz provided data into structs with better results. Not sure if there's anything obvious to do with this information except to document the downsides?

It's probably easiest to describe the problem with examples:

// +build gofuzz
package mypackage

import fuzz "github.com/google/gofuzz"

func MyFunc(i int) {
    if i == 1337 {
        panic("found correct i")
    }
}

func Fuzz(data []byte) int {
    var i int
    fuzz.NewFromGoFuzz(data).Fuzz(&i)
    MyFunc(i)
    return 0
}

The above (an extension of the README example) would have to run for a very long time to find a crasher. Go-fuzz has its sonar to guide it to new coverage gated by comparisons e.g. the following finds crashers almost immediately:

// +build gofuzz

package mypackage

import "encoding/binary"

func MyFunc(i int) {
    if i == 1337 {
        panic("found correct i")
    }
}

func Fuzz(data []byte) int {
    i := binary.BigEndian.Uint64(data)
    MyFunc(int(i))
    return 0
}

Decoding the data from go-fuzz as little-endian, hex encoded strings or ascii will also result in the crasher being found quickly as the number "1337" is used in the mutator in all of these forms.

I found that applying this patch

diff --git a/bytesource/bytesource.go b/bytesource/bytesource.go
index 5bb3659..efd6a65 100644
--- a/bytesource/bytesource.go
+++ b/bytesource/bytesource.go
@@ -40,7 +40,9 @@ func New(input []byte) *ByteSource {
                fallback: rand.NewSource(0),
        }
        if len(input) > 0 {
-               s.fallback = rand.NewSource(int64(s.consumeUint64()))
+               seed := make([]byte, 8)
+               copy(seed, input)
+               s.fallback = rand.NewSource(int64(binary.BigEndian.Uint64(seed)))
        }
        return s
 }

allowed go-fuzz to work a little bit better for this scenario. The improvement comes because the first 8 bytes of the go-fuzz input are not lost to the random number generator seed.

The following demonstrates how inputs from go-fuzz with string literals are unrecognisable after passing through NewFromGoFuzz.Fuzz:

package main

import (
    "fmt"

    fuzz "github.com/google/gofuzz"
)

func main() {
    data := []byte("-----BEGIN RSA PRIVATE KEY-----")
    var str string
    fuzz.NewFromGoFuzz(data).Fuzz(&str)
    fmt.Printf("str = %s\n", str)
}

outputs

str = Ȱ#ǻŐ蒉

Here, randString will consume at least 8*(n+1) bytes of the go-fuzz input (n is the length chosen for the string) in such a way as to "slice and dice" the input leaving it with no recognisable fragments. This prevents the fuzzer finding crashers for the following:

// +build gofuzz
package mypackage

import fuzz "github.com/google/gofuzz"

func MyFunc(str string) {
    if str == "1337" {
        panic("found correct")
    }
}

func Fuzz(data []byte) int {
    var str string
    fuzz.NewFromGoFuzz(data).Fuzz(&i)
    MyFunc(str)
    return 0
}

An equivalent fuzzer without gofuzz

// +build gofuzz
package mypackage

import fuzz "github.com/google/gofuzz"

func MyFunc(str string) {
    if str == "1337" {
        panic("found correct string")
    }
}

func Fuzz(data []byte) int {
    MyFunc([]byte(str))
    return 0
}

immediately finds a crasher. It should also be noted that strings from gofuzz will be limited in length and well-formed utf-8 which may be at odds with the aims for fuzzing.

Thanks for the detailed report! I'll think if there's anything we can do.