dvyukov/go-fuzz

Q: Debugging abysmal execution numbers

RonjaPonja opened this issue · 17 comments

Ahoi!

I'm trying to toy with go-fuzz by fuzzing the MCL language lexer/parser of mgmt. Unfortunately my tests are extremely slow on both macos and linux when go-fuzz calls my Fuzz function. That is not so when calling the function myself.

My overly simple fuzz target in fuzz/fuzz.go looks like this:

package fuzz

import (
	"bytes"
	"github.com/purpleidea/mgmt/lang"
)

func Fuzz(data []byte) int {
	lang.LexParse(bytes.NewReader(data))
	return 0
}

fuzz/corpus/third.mcl only contains:

$three = 3

I'm utilizing this target via a simple main in main-fuzz.go like this:

package main

import (
	"io/ioutil"
	"github.com/purpleidea/mgmt/fuzz"
	"net/http"
	_ "net/http/pprof"
)

func main() {
	// we need a webserver to get the pprof webserver
	go func() {
		http.ListenAndServe("localhost:6060", nil)
	}()

	dat, _ := ioutil.ReadFile("./fuzz/corpus/third.mcl")
	for i := 1; i <= 1000; i++ {
		fuzz.Fuzz(dat)
	}
}

Running this takes only a few seconds. Not amazingly fast, but not too shabby either:

space@FeTAp ..epos/go/src/github.com/purpleidea/mgmt (git)-[master] % time go run main-fuzz.go
go run main-fuzz.go  2.44s user 1.02s system 106% cpu 3.251 total

Unfortunately, when building and running it with go-fuzz, I get abysmal execution numbers:

space@FeTAp ..go/src/github.com/purpleidea/mgmt/fuzz (git)-[master] % go-fuzz --procs=1 -bin=./fuzz-fuzz.zip -workdir=.
2020/02/18 12:52:26 workers: 1, corpus: 1 (3s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 3s
2020/02/18 12:52:29 workers: 1, corpus: 1 (6s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 6s
2020/02/18 12:52:32 workers: 1, corpus: 1 (9s ago), crashers: 0, restarts: 1/22, execs: 22 (2/sec), cover: 0, uptime: 9s
2020/02/18 12:52:35 workers: 1, corpus: 1 (12s ago), crashers: 0, restarts: 1/35, execs: 35 (3/sec), cover: 0, uptime: 12s
2020/02/18 12:52:38 workers: 1, corpus: 1 (15s ago), crashers: 0, restarts: 1/37, execs: 37 (2/sec), cover: 0, uptime: 15s
2020/02/18 12:52:41 workers: 1, corpus: 1 (18s ago), crashers: 0, restarts: 1/42, execs: 42 (2/sec), cover: 0, uptime: 18s
2020/02/18 12:52:44 workers: 1, corpus: 1 (21s ago), crashers: 0, restarts: 1/44, execs: 44 (2/sec), cover: 0, uptime: 21s
2020/02/18 12:52:47 workers: 1, corpus: 1 (24s ago), crashers: 0, restarts: 1/44, execs: 44 (2/sec), cover: 0, uptime: 24s
2020/02/18 12:52:50 workers: 1, corpus: 1 (27s ago), crashers: 0, restarts: 1/49, execs: 49 (2/sec), cover: 0, uptime: 27s
2020/02/18 12:52:53 workers: 1, corpus: 1 (30s ago), crashers: 0, restarts: 1/51, execs: 51 (2/sec), cover: 0, uptime: 30s

Any tips how to how to debug this. Can I utilize pprof from inside go-fuzz somehow?

Please try to export CGO_ENABLED=0 and the re-run go-fuzz-build and go-fuzz.

That doesn't change anything unfortunately.

space@FeTAp ..go/src/github.com/purpleidea/mgmt/fuzz (git)-[master] %CGO_ENABLED=0 go-fuzz-build
CGO_ENABLED=0 go-fuzz-build  185.25s user 24.15s system 410% cpu 51.043 total

space@FeTAp ..go/src/github.com/purpleidea/mgmt/fuzz (git)-[master] %CGO_ENABLED=0 go-fuzz --procs=1 -bin=./fuzz-fuzz.zip -workdir=.
2020/02/18 14:37:23 workers: 1, corpus: 1 (3s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 3s
2020/02/18 14:37:26 workers: 1, corpus: 1 (6s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 6s
2020/02/18 14:37:29 workers: 1, corpus: 1 (9s ago), crashers: 0, restarts: 1/24, execs: 24 (3/sec), cover: 0, uptime: 9s
2020/02/18 14:37:32 workers: 1, corpus: 1 (12s ago), crashers: 0, restarts: 1/35, execs: 35 (3/sec), cover: 0, uptime: 12s
2020/02/18 14:37:35 workers: 1, corpus: 1 (15s ago), crashers: 0, restarts: 1/37, execs: 37 (2/sec), cover: 0, uptime: 15s
2020/02/18 14:37:38 workers: 1, corpus: 1 (18s ago), crashers: 0, restarts: 1/42, execs: 42 (2/sec), cover: 0, uptime: 18s

Note I had to make this small patch to make mgmt compile. It's nonsensical but good enough for this purpose.

diff --git a/lang/funcs/core/sys/load_fact.go b/lang/funcs/core/sys/load_fact.go
index 4ad5011..5ac42a5 100644
--- a/lang/funcs/core/sys/load_fact.go
+++ b/lang/funcs/core/sys/load_fact.go
@@ -30,7 +30,7 @@ const (
 )

 func init() {
-       facts.ModuleRegister(ModuleName, "load", func() facts.Fact { return &LoadFact{} }) // must register the fact and name
+       //facts.ModuleRegister(ModuleName, "load", func() facts.Fact { return &LoadFact{} }) // must register the fact and name
 }

 // LoadFact is a fact which returns the current system load.
@@ -82,7 +82,8 @@ func (obj *LoadFact) Stream() error {
                        return nil
                }

-               x1, x5, x15, err := load()
+               x1, x5, x15 := 1.1, 1.1, 1.1
+               var err error = nil
                if err != nil {
                        return errwrap.Wrapf(err, "could not read load values")
                }

I suspect the real problem is this corpus: 1, cover: 0. I seen this before with cgo enabled when go-fuzz-build failed to instrument most packages. I don't remember if it also resulted in slow execution or not.

Note that cover does go up eventually if I keep it running for a few more seconds. Maybe it's really just dragging its feet due to the slow execution:

space@FeTAp ..go/src/github.com/purpleidea/mgmt/fuzz (git)-[master] % go-fuzz -bin=./fuzz-fuzz.zip -workdir=.
2020/02/18 15:26:02 workers: 8, corpus: 1 (3s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 3s
2020/02/18 15:26:05 workers: 8, corpus: 1 (6s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 6s
2020/02/18 15:26:08 workers: 8, corpus: 1 (9s ago), crashers: 0, restarts: 1/24, execs: 24 (3/sec), cover: 0, uptime: 9s
2020/02/18 15:26:11 workers: 8, corpus: 1 (12s ago), crashers: 0, restarts: 1/35, execs: 35 (3/sec), cover: 0, uptime: 12s
2020/02/18 15:26:14 workers: 8, corpus: 1 (15s ago), crashers: 0, restarts: 1/37, execs: 37 (2/sec), cover: 0, uptime: 15s
2020/02/18 15:26:17 workers: 8, corpus: 1 (18s ago), crashers: 0, restarts: 1/42, execs: 42 (2/sec), cover: 0, uptime: 18s
2020/02/18 15:26:20 workers: 8, corpus: 1 (21s ago), crashers: 0, restarts: 1/44, execs: 44 (2/sec), cover: 0, uptime: 21s
2020/02/18 15:26:23 workers: 8, corpus: 1 (24s ago), crashers: 0, restarts: 1/44, execs: 44 (2/sec), cover: 0, uptime: 24s
2020/02/18 15:26:26 workers: 8, corpus: 1 (27s ago), crashers: 0, restarts: 1/49, execs: 49 (2/sec), cover: 0, uptime: 27s
2020/02/18 15:26:29 workers: 8, corpus: 1 (30s ago), crashers: 0, restarts: 1/51, execs: 51 (2/sec), cover: 0, uptime: 30s
2020/02/18 15:26:32 workers: 8, corpus: 1 (33s ago), crashers: 0, restarts: 1/55, execs: 55 (2/sec), cover: 0, uptime: 33s
2020/02/18 15:26:35 workers: 8, corpus: 1 (36s ago), crashers: 0, restarts: 1/55, execs: 55 (2/sec), cover: 0, uptime: 36s
2020/02/18 15:26:38 workers: 8, corpus: 1 (39s ago), crashers: 0, restarts: 1/58, execs: 58 (1/sec), cover: 0, uptime: 39s
2020/02/18 15:26:41 workers: 8, corpus: 1 (42s ago), crashers: 0, restarts: 1/63, execs: 63 (1/sec), cover: 0, uptime: 42s
2020/02/18 15:26:44 workers: 8, corpus: 1 (45s ago), crashers: 0, restarts: 1/69, execs: 69 (2/sec), cover: 0, uptime: 45s
2020/02/18 15:26:47 workers: 8, corpus: 2 (0s ago), crashers: 0, restarts: 1/72, execs: 72 (1/sec), cover: 0, uptime: 48s
2020/02/18 15:26:50 workers: 8, corpus: 7 (1s ago), crashers: 0, restarts: 1/76, execs: 76 (1/sec), cover: 639, uptime: 51s
2020/02/18 15:26:53 workers: 8, corpus: 8 (1s ago), crashers: 0, restarts: 1/31, execs: 287 (5/sec), cover: 643, uptime: 54s
2020/02/18 15:26:56 workers: 8, corpus: 9 (0s ago), crashers: 0, restarts: 1/33, execs: 406 (7/sec), cover: 646, uptime: 57s
2020/02/18 15:26:59 workers: 8, corpus: 10 (2s ago), crashers: 0, restarts: 1/39, execs: 476 (8/sec), cover: 692, uptime: 1m0s
2020/02/18 15:27:02 workers: 8, corpus: 11 (1s ago), crashers: 0, restarts: 1/41, execs: 574 (9/sec), cover: 692, uptime: 1m3s
2020/02/18 15:27:05 workers: 8, corpus: 12 (2s ago), crashers: 0, restarts: 1/45, execs: 642 (10/sec), cover: 705, uptime: 1m6s
2020/02/18 15:27:08 workers: 8, corpus: 12 (5s ago), crashers: 0, restarts: 1/42, execs: 672 (10/sec), cover: 705, uptime: 1m9s
2020/02/18 15:27:11 workers: 8, corpus: 12 (8s ago), crashers: 0, restarts: 1/45, execs: 731 (10/sec), cover: 705, uptime: 1m12s
2020/02/18 15:27:14 workers: 8, corpus: 12 (11s ago), crashers: 0, restarts: 1/47, execs: 761 (10/sec), cover: 705, uptime: 1m15s

^ That is without CGO_ENABLED=0 during build or run.

Does the lexer emit a bunch of logging?

If you seed the corpus with a bunch of legitimate inputs, does fuzzing run faster (in terms of execs/sec)?

Does the lexer get much slower with longer inputs?

If you bump up/down the -procs arg to go-fuzz a lot, what impact does it have on execution speed? (Be sure to wait past the first warmup period of maybe 30s or so.)

Re: pprof from within fuzzing, I think so, mosty, with some work. From memory—I’m on my phone—you should do this. In your fuzz function, start/stop profiling like usual, writing output to a randomly named file in a known directory. Then take the fuzz.zip and unzip it. Inside will be two binaries, one with sonar and one without. Unfortunately the profiles will be a mixed batch, but most will be for the non-sonar binary, so you can just use that and expect some random noise/gibberish from the profiles taken using the sonar binary.

Note that cover does go up eventually if I keep it running for a few more seconds.

Ah, then it's something different.
Does it consume 100% CPU? Is it go-fuzz of the subprocess? Profiling one/both with perf may shed some light.

Does the lexer emit a bunch of logging?

It doesn't output anything as you can see in the above examples when I call it with my dummy maian.

If you seed the corpus with a bunch of legitimate inputs, does fuzzing run faster (in terms of execs/sec)?

No, generally slower since all my other examples are more complex then that one liner.

Does the lexer get much slower with longer inputs?

It does, but reasonably so I would say. With my dummy main and with that 11 byte sample I get like 16000 execs per second. With a 1154 byte sample I get like 490 execs per second.

If you bump up/down the -procs arg to go-fuzz a lot, what impact does it have on execution speed? (Be sure to wait past the first warmup period of maybe 30s or so.)

It helps as long as I don't go much higher then I have cores. Still only a few executions per second though. I have a very similar go-fuzz instance running on my server for about 24 hours now and it dropped to 1 exec / sec using 6 procs.

Re: pprof from within fuzzing, I think so, mosty, with some work. From memory—I’m on my phone—you should do this. In your fuzz function, start/stop profiling like usual, writing output to a randomly named file in a known directory. Then take the fuzz.zip and unzip it. Inside will be two binaries, one with sonar and one without. Unfortunately the profiles will be a mixed batch, but most will be for the non-sonar binary, so you can just use that and expect some random noise/gibberish from the profiles taken using the sonar binary.

Ah cool, I'll try that.

Does it consume 100% CPU?

It does.

Is it go-fuzz of the subprocess?

The go-fuzz process doesn't seem to use much CPU most of the time, but the children do.

Profiling one/both with perf may shed some light.

Aye.

So I pimped my fuzz/fuzz.go like this:

package fuzz

import (
	"os"
	"bytes"
	"runtime/pprof"
	"github.com/purpleidea/mgmt/lang"
	"github.com/google/uuid"
)

func Fuzz(data []byte) int {
	f, _ := os.Create("./profiles/" + uuid.New().String())
	defer f.Close()
	pprof.StartCPUProfile(f)
	defer pprof.StopCPUProfile()

	lang.LexParse(bytes.NewReader(data))
	return 0
}

I repackaged and ran the fuzzer for ~five minutes and generated a joined pdf via go tool pprof --pdf ../fuzz-fuzz/sonar.exe * > all.pdf. I had to delete a couple of 0 byte profiles for pprof to not complain. The result is this: go-fuzz.pdf

Repeating that test using my custom main-fuzz.go with 100000 calls to Fuzz results in this: main1.pdf which is... not helpful. The pdf is empty, seems like the individual Fuzz calls are too quick and none of the profile files contain any information.

While this is not quite comparing apples to apples I made some tests with my main looking like this:

package main

import (
	"os"
	"log"
	"io/ioutil"
	"github.com/purpleidea/mgmt/fuzz"
	"runtime/pprof"
)

func main() {
	f, err := os.Create("./profile1")
	if err != nil {
		log.Fatal("could not create CPU profile: ", err)
	}
	defer f.Close()
	if err := pprof.StartCPUProfile(f); err != nil {
		log.Fatal("could not start CPU profile: ", err)
	}
	defer pprof.StopCPUProfile()

	dat, _ := ioutil.ReadFile("./fuzz/corpus/third.mcl")
	for i := 1; i <= 100000; i++ {
		fuzz.Fuzz(dat)
	}
}

The result looks like this: main2.pdf

Hope you can help me make heads or tails of this. Due to the empty profiles in my second test I'm also suspicious about how much we can even trust the first tests results.

There seems to be some problem with synchronization/scheduling. go-fuzz sets GOMAXPROCS to 1 somewhere, that may be the root cause. Try to call runtime.GOMAXPROCS(1) in your standalone test as well. And comment out GOMAXPROCS(1) in go-fuzz. These 2 experiments should confirm or decline the hypothesis.
Where is the source code for NewLexerWithInit? What does it do with select? Can't find it, it does not exist in the sources...

Is there some kind of active spinning involved? That would explain the manifestation.

seems like the individual Fuzz calls are too quick and none of the profile files contain any information.

Bummer. I’ll see about adding proper profiling support to go-fuzz. May take a few days.

In the meantime, if Dmitry’s suggestions don’t help, you can try caveman profiling: start a large number of procs, use killall to send SIGQUIT to all of the them, and look for commonalities in the backtraces. If a single function dominates the runtime, it’ll also dominate randomly selected backtraces.

There seems to be some problem with synchronization/scheduling. go-fuzz sets GOMAXPROCS to 1 somewhere, that may be the root cause. Try to call runtime.GOMAXPROCS(1) in your standalone test as well.

Adding runtime.GOMAXPROCS(1) to my main seems to indeed confirm your suspicions. Now it performs just as bad as go-fuzz: main3.pdf

And comment out GOMAXPROCS(1) in go-fuzz. These 2 experiments should confirm or decline the hypothesis.

Unfortunately commenting out the GOMAXPROCS from go-fuzz does not fix the issue. It looks like this: go-fuzz2.pdf

And this is the diff:

diff --git a/go-fuzz-dep/main.go b/go-fuzz-dep/main.go
index 09bb3c5..10ab604 100644
--- a/go-fuzz-dep/main.go
+++ b/go-fuzz-dep/main.go
@@ -7,7 +7,7 @@
 package gofuzzdep

 import (
-       "runtime"
+       // "runtime"
        "sync/atomic"
        "syscall"
        "time"
@@ -21,7 +21,7 @@ func Main(fns []func([]byte) int) {
        CoverTab = (*[CoverSize]byte)(unsafe.Pointer(&mem[0]))
        input := mem[CoverSize : CoverSize+MaxInputSize]
        sonarRegion = mem[CoverSize+MaxInputSize:]
-       runtime.GOMAXPROCS(1) // makes coverage more deterministic, we parallelize on higher level
+       // runtime.GOMAXPROCS(1) // makes coverage more deterministic, we parallelize on higher level
        for {
                fnidx, n := read(inFD)
                if n > uint64(len(input)) {
diff --git a/go-fuzz/main.go b/go-fuzz/main.go
index 123c253..32acca2 100644
--- a/go-fuzz/main.go
+++ b/go-fuzz/main.go
@@ -71,7 +71,7 @@ func main() {
                os.Exit(0)
        }()

-       runtime.GOMAXPROCS(min(*flagProcs, runtime.NumCPU()))
+       // runtime.GOMAXPROCS(min(*flagProcs, runtime.NumCPU()))
        debug.SetGCPercent(50) // most memory is in large binary blobs
        lowerProcessPrio()

Where is the source code for NewLexerWithInit? What does it do with select? Can't find it, it does not exist in the sources...

It seems to come from nex, the lexing library used by mgmt: https://github.com/blynn/nex#reference Maybe the issue is already within nex.

I think this is the bug:
https://github.com/blynn/nex/blob/master/nex.go#L898-L906
If both channels are not ready, it will actively spin forever. This is never a good idea.

It seems the default case can simply be deleted. Why?
And if only true is send to ch_stop, then more code can be deleted. Looks somewhat over-complicated.

Bingo! When deleting the default case I jump up to over 10k execs per second using stock go-fuzz, even when using more complex inputs :) The individual Fuzz calls are now also so fast that all the profiles I write are empty, but I guess we can consider it fixed anyway.

Do you want to create a PR upstream or do you want me to do it?

Thanks for the patience!

Please take it form here.