proposal: cmd/go: make fuzzing a first class citizen, like tests or benchmarks

Question

proposal: cmd/go: make fuzzing a first class citizen, like tests or benchmarks

bradfitz opened this issue 9 years ago · 116 comments

Filing a proposal on behalf of @kcc and @dvyukov:

They request that cmd/go support fuzzing natively, just like it does tests and benchmarks and race detection today.

https://github.com/dvyukov/go-fuzz exists but it's not as easy as writing tests and benchmarks and running "go test -race" today.

Should we make this easier?

Motivation
Proposal

Answer 1 · 2017-02-15T19:50:01.000Z

I think it would be easier to evaluate the idea if it were slightly less abstract.

For example:

_test.go are permitted to contain functions of the form FuzzXxx(f *testing.F, data []byte)
these functions are expected to run some test based on the random bytes in data
errors are reported using the testing.F argument in the usual way
f.Useful() may be called to indicate useful data, i.e., data that parses correctly
f.Discard() may be called to indicate that the data should be discarded
go test -fuzz=. runs the fuzz functions using a regexp like -test and -bench
naturally go test -fuzz must also rebuild the package in fuzz mode
the data is cached somewhere under $GOROOT/pkg, but where?

Answer 2 · 2017-02-15T19:53:01.000Z

@ianlancetaylor, yes, FuzzXxx(f *testing.F, ...) is what this is about. The exact API is probably TBD.

I think the first step before it's designed completely is to determine whether there's interest.

Answer 3 · 2017-02-15T20:30:51.000Z

As a general concept, I'm in favor.

Answer 4 · 2017-02-15T20:40:26.000Z

I would expect that there would be an additional required flag (when fuzzing) where you specify the corpus directory.

Answer 5 · 2017-02-15T20:47:51.000Z

Can we just cache the corpus somewhere under $GOROOT/pkg? Are there cases where a typical user would be expected to modify the corpus themselves?

Answer 6 · 2017-02-15T21:11:14.000Z

I think it's wrong to think of the corpus as strictly a cache. The corpus is the save state of the fuzzer and the documentation for go-fuzz even recommends committing them into a version control system. The pkg directory is treated strictly as cache and it is not uncommon for people to recommend clearing out the directory, which will unfortunately delete the fuzzer state.

A specified corpus is not so much for the user modify the corpus themselves, but for them to specify how to persist the corpus data.

Answer 7 · 2017-02-15T21:37:29.000Z

Could there be some default convention say a _fuzz/xxx directory (where xxx corresponds with FuzzXxx) and a method on the *testing.F object to load a different corpus from the _fuzz/ directory if necessary? It seems like it should just know where the corpus is.

Answer 8 · 2017-02-15T22:31:09.000Z

I'm in favor. Efficient fuzzing usually requires some help from compiler so it's better to built this into std. (compiler instrumentation will be more efficient than go-fuzz's source code instrumentation. I also want to have cmd/cover built on compiler instrumentation to support branch coverage, but that's off-topic to this issue.) How about add some methods to testing.T (or perhaps invent a new testing.Fuzz to replace testing.T, but see below) in fuzz tests? One of the method could be setting the corpus location (we can recommend it to be saved under testdata). but we probably should also a command line flag to override the test's setting (one compromise is to make both optional: Introduce -test.fuzzdir to hold the corpus path for all fuzz tests. If not provided, default to testdata/fuzz (*testing.T).FuzzDir("parser") // optional call to set corpus directory prefix location for this test, if relative, then relative to the -test.fuzzdir value. To make the feature more useful, I suggest we still use testing.T so that it's quite easy to migrate fuzz found test cases into a (table driven) regular test. Making fuzz tests taking a testing.T will facilitate this.

Answer 9 · 2017-02-16T10:38:01.000Z

Quoting @dvyukov

I would appreciate if you drop a line there if you found fuzzing useful and a brief of your success story.

It was very useful for me - found bugs in several lexers.

Answer 10 · 2017-02-16T10:40:32.000Z

I use it regularly on a lexer/parser/formatter for Bash (https://github.com/mvdan/sh).

Having it be a first-class citizen would simplify things for me and for contributors.

Answer 11 · 2017-02-16T10:43:29.000Z

Found a bug in the C decoder for google/brotli by fuzzing a Go implementation of a Brotli decoder.

Also found some divergences in Go bzip2 decoders from the canonical C decoder (this and #18516). All by fuzzing.

Answer 12 · 2017-02-16T11:07:43.000Z

My coworker at DigitalOcean was working on a side project to make fuzzing easier. Check his repo out here: https://github.com/tam7t/cautious-pancake Adding it here as I think it would be a valuable piece of information for this discussion.

Answer 13 · 2017-02-16T11:10:42.000Z

The README for go-fuzz lists a number of "Trophies", ( https://github.com/dvyukov/go-fuzz#trophies ) the majority of which are from the standard library, but about 20% of which are external to the Go standard libraries.

A GitHub search for Go source files with the gofuzz build tag gives ~2500 results: https://github.com/search?l=Go&q=gofuzz&type=Code&utf8=%E2%9C%93

My tutorial on fuzzing ( https://medium.com/@dgryski/go-fuzz-github-com-arolek-ase-3c74d5a3150c ) gets 50-60 "reads" per month (according to medium's stats).

Answer 14 · 2017-02-16T11:16:53.000Z

Feature that would be also important (at least for me) would be ease of turning some selected Fuzz test cases into permanent tests. Simplest way to do it would be exporting the case data in go byte array and calling FuzzXXX function from TestXXX function but if FuzzXXX accepts *testing.F struct type it won't be possible.

Answer 15 · 2017-02-16T12:13:41.000Z

Yes, we've found fuzzing useful in our projects multiple times. Especially sensitive code, the fuzzer will frequently find edge cases that we missed. Encoding, networking, and generally things that depend on user input.

I will say that most of the benefit is usually seen in the first tiny bit of fuzzing. There's a pretty strong diminishing returns as you continue to fuzz, at least that's what we've found.

Answer 16 · 2017-02-16T12:16:07.000Z

As you can understand, I am very supportive for this. Traditional testing is just not enough for modern development speeds. I am ready to dedicate some time to work on parts of this.

Throwing some ideas onto the table:

To flesh out the interface, we don't need to implement coverage nor any actual smartness. The interface should work if we just feed in completely random data, it will be just less useful. But I think it's the right first step. We can transparently add smartness later.
It would be nice to have some default location for corpus, because it will make onboarding easier. The location probably needs to be overridable with go test flag or GOFUZZ env var.
I think it's "must have" that fuzz funciton runs during normal testing. If corpus is present, each input from corpus is executed once. Plus we can run N random inputs.
Thinking how we can integrate continuous fuzzing into Go std lib testing (including corpus management) would be useful to ensure that it will also work for end users in their setups.
go command (or whatever runs fuzz function) might need some additional modes. For example, execute 1 given input, useful for crash debugging. Or, run all programs from corpus and dump code coverage report.
I am ready to give up on f.Useful() and f.Discard() for simplicity (as far as I understand that come from go-fuzz return values). They were never proven to be useful enough. For Discard Fuzz function can just return. And fuzzer can try to figure out Useful automatically.
In some cases Fuzz function needs more than just []byte. For example, regexp test needs the regular expression and a string to match. Other tests may need some additional int's and bool's. It's possible to manually split []byte into several strings and also take some bits as int's and bool's. But it's quite inconvenient and can negatively affect fuzzing efficiency (fuzzer can do better if it understands more about input structure). So we could consider allowing Fuzz function to accept a set of inputs with some restrictions on types, e.g. FuzzFoo(f *testing.F, s1, s2 string, x int, b bool). But this can be added later as backwards compatible extension. Just something to keep in mind.
An alternative interface could be along the following lines:

func FuzzFoo(f *testing.F) {
  var data []byte
  f.GetRandomData(&data)
  // use data
}

GetRandomData must be called once and always with the same type.
Since the function now does not accept the additional argument, we can make it a normal test:

func TestFoo(t *testing.T) {
  var data []byte
  testing.GetRandomData(&data)
  // use data
}

This recalls testing/quick interface considerably, so maybe we could just use testing/quick for this.
go tool will need to figure out that this is a fuzzing function based on the call to testing.GetRandomData.

Answer 17 · 2017-02-16T12:28:06.000Z

I will say that most of the benefit is usually seen in the first tiny bit of fuzzing. There's a pretty strong diminishing returns as you continue to fuzz, at least that's what we've found.

That's true to some degree, but not completely. It depends on (1) complexity of your code, (2) rate of change of your code, (3) smartness of the fuzzer engine. If your code is simple and doesn't change, then fuzzer will find everything it can in minutes. However, if your code change often, you want to run fuzzing continuously as regression testing. If your code is large and complex and fuzzer is smart enough, then it can manage to uncover complex bugs only after significant time.
One example is this bug in OpenSSL bignum asm implementation that we've found after several CPU years of fuzzing: https://github.com/google/fuzzer-test-suite/tree/master/openssl-1.1.0c
Another example is our Linux kernel fuzzing which uncovers bugs at roughly constant rate over more than a year (due to complexity of the code and frequent changes): https://github.com/google/syzkaller/wiki/Found-Bugs

Answer 18 · 2017-02-16T14:38:15.000Z

I'm fine with fuzzing, but the problem is that if you vendor in a library that fuzzes, then... you inherit all their corpus. So, I'm not a fan of corpus being checked into the project.

Case in point:

Overall, I think fuzzing is a must have. Glad to see a proposal to make it easier.

Answer 19 · 2017-02-16T15:20:10.000Z

To confirm @dvyukov in #19109 (comment) , it would be really nice to have supported types other than []byte. We found bugs in both the gonum/blas implementation and the OpenBLAS library using fuzzing. It's possible to use go-fuzz, but it's kind of a pain to parse the []byte directly, (https://github.com/btracey/blasfuzz/blob/master/onevec/idamax.go).

Answer 20 · 2017-02-16T16:40:57.000Z

Suggest it goes under the subfolder testdata. Then any tools that ignore tests will also ignore this dir.

Answer 21 · 2017-02-16T20:24:34.000Z

@dvyukov

I think it's "must have" that fuzz funciton runs during normal testing. If corpus is present, each input from corpus is executed once. Plus we can run N random inputs.

I have concerns about how much time this is going to add to testing. My experience with fuzzing is that compiling with the fuzz instrumentation takes a significant amount of time. I'm not sure this is something we want to inflict upon every use of go test.

Answer 22 · 2017-02-16T21:19:44.000Z

@dsnet to execute corpus and check if it doesn't fail instrumentation isn't needed. Instrumentation is needed when you want to expend/improve the corpus.

Answer 23 · 2017-02-16T21:44:53.000Z

Should there be a story to make it easy to use external fuzzing engines?

Answer 24 · 2017-02-16T22:04:36.000Z

@Kubuxu, I'm comfortable with running the Fuzz functions as a form of test without special instrumentation, but Dmitry comment suggested running with N random inputs, which implies having the instrumentation involved.

Answer 25 · 2017-02-16T22:20:01.000Z

My 2c (I am utterly ignorant about Go, but have some ideas about fuzzing)

There are several major parts in coverage-guided fuzzing as I can see it:

instrumentation
interface
fuzzing engines' logic (how to mutate, choose elements to add to the corpus, etc)
integration with the rest of Go testing infra (I won't comment on this one -- no opinion)

Instrumentation is better to be done in the compiler, this way it's the most efficient and easy to use.
In LLVM we have these two kinds of instrumentation used for guided fuzzing:
https://clang.llvm.org/docs/SanitizerCoverage.html#tracing-pcs-with-guards (control flow feedback)
https://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow (data flow feedback)

The interface must be as simple as possible. For C/C++ our interface (which we use with libFuzzer, AFL, hoggfuzz, and a few others) is:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  DoSomethingInterestingWithMyAPI(Data, Size);
  return 0;  // Non-zero return values are reserved for future use.
}

and the only thing I regret is that the return type is not void.
IMHO, for the first version of the interface for Go fuzzing, the go-fuzz approach is perfect:

func Fuzz(data []byte) int

(again, not confident about int return value)

Fuzzing engines and the interface should be independent.
It should be possible to plug any fuzzing engine (not necessary written in Go) to fuzz a Go fuzz target.
Such fuzzing engine may need to understand the feedback provided by the instrumentation though.
E.g. I'd love to try libFuzzer/AFL on the Go targets.
And by fuzzing engine we should understand a wider class of tools, including e.g. concolic execution tools.

And, it would be nice to have the new fuzzing engine(s) to behave similar to AFL, libFuzzer, and go-fuzz
so that they are easier to integrate with continuous fuzzing service(s) (e.g. oss-fuzz)

Should there be a story to make it easy to use external fuzzing engines?

Absolutely, see above.

it would be really nice to have supported types other than []byte.

Maybe.
For the really complex data structures our current answer is to use protobufs as the input:
https://github.com/google/libprotobuf-mutator
There is also a middle ground where you need to fuzz e.g. a pair of strings.
But I would rather have a standard adapter from from []byte into a pair of strings than to complicate the interface.

Are there cases where a typical user would be expected to modify the corpus themselves?

Corpus is not a constant. It evolves as long as the code under test changes, fuzzing techniques evolve, and simply more CPU hours are spent fuzzing.
We typically store a seed corpus in RCS, maintain a larger corpus on the fuzzing service,
and periodically merge it back to RCS.

Note: a corpus stored in RCS allows to perform regression testing (w/o fuzzing)

I also want to have cmd/cover built on compiler instrumentation to support branch coverage, but that's off-topic to this issue.

Not too much off-topic.
This approach in LLVM allows us to get various kinds of coverage data with the same compiler instrumentation, by just re-implementing the callbacks.

I'm fine with fuzzing, but the problem is that if you vendor in a library that fuzzes, then... you inherit all their corpus. So, I'm not a fan of corpus being checked into the project.

This is a price worth paying since the corpus often turns out to be a great regression test.

I have concerns about how much time this is going to add to testing. My experience with fuzzing is that compiling with the fuzz instrumentation takes a significant amount of time. I'm not sure this is something we want to inflict upon every use of go test.

If you don't enable fuzzing instrumentation (which won't be on by default, I think) you won't pay for it.

Answer 26 · 2017-02-16T22:29:44.000Z

A separate topic worth thinking about is fuzzing for equivalence between two implementations of the same protocol.

Imagine your code has
func ReferenceFoo(data []byte) SomeType and
func ExperimentalOptimizedFoo(data []byte) SomeType.

Then you can fuzz the following target to verify that the two implementations match:

func Fuzz(data []byte) int {
    if ReferenceFoo(data) != ExperimentalOptimizedFoo(data) {
       panic("ouch!")
    }
    return 0
}

This works pretty well when both things are implemented in Go.
But imagine you are porting to Go something previously written in C.
Here is a write up that describes one possible solution:
https://moderncrypto.org/mail-archive/hacs/2017/000001.html
(in short: have two processes running in parallel and exchanging data via shared memory or some such)

Answer 27 · 2017-02-17T08:01:41.000Z

I love this.

And I think a good solution to the corpus location, like

defaulting to testdata/FuzzXxx/
run (only) the corpus cases w/o flags

would

remove the need to duplicate code to "freeze" certain testcases
avoid sacrificing the API to fit it in a testing.T
be a more elegant solution that doesn't require putting binary data in source files

Projects that don't commit the corpus could use -fuzzcorpus (or similar) when fuzzing, and then copy the test cases they want to run every time in the testdata folder and check them in.

Actual fuzzing could be controlled by -fuzztime (like -benchtime).

Answer 28 · 2017-02-19T01:04:28.000Z

Fuzzing is basically coverage based randomized testing, and we already have a randomized testing package: testing/quick. I think we shouldn't limit ourselves to just fuzzing a []byte, the fuzz function should take arbitrary type supported by testing/quick. i.e. if I'm fuzzing an integer sort routine, I should be able to write: func FuzzIntSort(f *testing.F, input []int) { // ... } Also, if the function takes a struct, this also opens door to use field tags (or even methods) to hint/provide non-trivial data validity requirements to the fuzz engine in order to avoid having the engine discover the restriction itself. Limiting the input to []byte is more suited to protocol/parser fuzzing, but with proper Go tool support, we can do significantly better. Perhaps we can even merge this "coverage-based" part into testing/quick itself. Just some random (no pun intended) idea to think about.

Answer 29 · 2017-02-21T10:51:41.000Z

@dsnet No, I meant just plain random, not instrumentation involved.
Consider that you just wrote a Fuzz function (or checked out some code without corpus), now you can do 'go test' and already get some results from the Fuzz function.

Answer 30 · 2017-02-21T10:54:13.000Z

@kcc re fuzzing for equivalence
Testing Go vs C is simple with cgo, there is an example of testing regexp vs re2 in go-fuzz.
Testing several Go packages against each other is also trivial as there are no name clashes, just import both package.
So I don't think we need to do something special for this in the first version.

Answer 31 · 2017-02-21T11:09:40.000Z

@minux Yes, this is fittable into testing/quick. I can't make my mind as to whether we need to fit it into testing/quick or not yet.
On one hand, (1) quick gives suboptimal interface, (2) we can't support all testing/quick tests (e.g. calling quick.Value twice per test, or with different types, is problematic), (3) we can't have any fuzzing-specific user API since these are just normal tests.
But on the other hand, testing.TB seems to provide everything we need (with t.Skip as "discard this input"). So it would be nice to make them just normal tests with no new APIs.
Any thoughts?

Answer 32 · 2017-02-23T17:14:01.000Z

I like the idea a lot. I caught notice of a presentation by Dmitry (perhaps in the Go Newsletter?) some time ago and watched it and was as amazed and enthusiastic as the audience. I stick pretty close to the standard library though, mostly out of proximity and familiarity and wanting to keep things simple, and only venture out when forced by a clear benefit. All that is to say if fuzzing is as useful as it looked to be and it were in the standard library, I feel it would likely reach many more users of Go.

Answer 33 · 2017-02-25T07:18:19.000Z

With the current go-fuzz I wish I could:

write multiple Fuzz functions in a package. The workaround is to write Fuzz functions in separate packages, but this blocks from fuzzing directly private functions of the package.
reuse functions I already wrote in *_test.go sources for testing in the Fuzz function
reuse existing tests to feed a fuzzing corpus: this must be done "manually" for now (take values in Go code and save them in data files) and it's hard to keep them in sync. For example if I add a new input test case I wish it could be added immediately (without duplicating it) to the fuzzing corpus.
more easily cleanup the irrelevant input from the corpus after a functional change either in the fuzzed code or in the Fuzz function

So integrating fuzzing in *_test.go sources and giving "go test" the options to control them would be great, at least for 1. and 2. !

Answer 34 · 2017-02-25T08:47:13.000Z

Note also that fuzzing is currently mostly used to find edge cases that crash programs. But it could also be used to find worst case scenarios (speed, memory usage) in algorithms: this could then be used for benchmarking. So far we can share code between testing and benchmarking. Fuzzing support should be added along testing, benchmarking to benefit to the trio, not just testing.

Answer 35 · 2017-02-27T21:21:11.000Z

There is still not much technical detail here about what is being proposed. It's hard to evaluate without concrete details of what is wanted. Especially with all the details about extra directories full of input corpus files, this starts to seem something heavy weight.

The open questions seem to be:

What is being proposed?
Does this need to be in the standard distribution? Is there some reason it doesn't work as a third-party tool?

Note that even a third-party tool could look for func FuzzFoo([]byte) functions in *_test.go files. That would allow someone to prototype all of this outside the standard distribution.

Answer 36 · 2017-03-07T03:56:02.000Z

Still waiting for answers to the open questions.

My suggestion for moving forward would be to change go-fuzz to be - as much as possible - exactly what you want the new go test fuzz mode to be. That means that go-fuzz image/png and cd image/png; go-fuzz need to work sensibly, with no arguments or other preparation (currently go-fuzz needs a lot more than that).

The go command already ignores testdata directories, so go-fuzz could assume corpus data in testdata/FuzzName.fuzz or testdata/fuzz/FuzzName.corpus or anything like that. I'm still worried about the corpus data overwhelming the source code repos, but we would find out.

The go command also ignores functions with names other than TestName and BenchmarkName, so the fuzz functions can be written in the test files and will be ignored by the standard tools. I'd suggest that go-fuzz should expected func FuzzFoo(...).

If you want to allow the fuzzed function to take a *testing.F for error reporting in the long term, you could start with using testing.TB instead. Note that you can implement a testing.TB even though it has unexported methods, like this:

type fuzzState struct {
    ... my state ...
    testing.TB
}

func (f *fuzzState) Error(args ...interface{}) { ... }
... etc implementing all the exported methods of testing.TB ...

This way, a *fuzzState implements testing.TB and can be passed to the FuzzFoo functions, the FuzzFoo functions don't need to import anything but "testing" to declare their arguments. Neither can get at the unexported methods, so the fact that they panic (because they invoke methods of the nil embedded TB in fuzzState) doesn't matter.

Answer 37 · 2017-03-09T19:33:21.000Z

Here is full-fledged proposal:
https://docs.google.com/document/u/1/d/1zXR-TFL3BfnceEAWytV8bnzB2Tfp6EPFinWVJ5V4QC8/pub

Answer 38 · 2017-03-11T01:41:21.000Z

This is pretty small, but instead of FuzzXxx(f *testing.F, data []byte) just FuzzXxx(f *testing.F) would be nice, and the data could be part of *testing.F.Data() []byte or some such. It will be more consistent with testing and bench, and I also support calling a method to acquire the data, which would allow the fuzzer to be configured before generating data.

Answer 39 · 2017-03-11T09:38:21.000Z

If you do it that way it's also really easy to add support for more types later, F.Int() and such

Answer 40 · 2017-03-12T12:30:13.000Z

@omeid @DavidVorick
One thing that I wanted to allow is future extension to:

func FuzzRegexp(f *testing.F, re string, data []byte, posix bool) { ... }

and:

func FuzzIntSort(f *testing.F, input []int) { ... }

Another aspect is that fuzzer needs a stable structure of the input. We can't allow calling f.Data() on one invocation and f.Int() on another; or calling multiple of them depending on some dynamic conditions in the fuzz function.

One alternative that was mentioned in this issue is:

func TestFoo(t *testing.T) {
  var data someStruct
  testing.GetRandomData(&data)
  // use data
}

This can work (provided that user is required to call GetRandomData once and only once and always with the same type).
But I don't see why this alternative is considerably better than what I described in the proposal. The proposed approach is more concise and intuitive I think.

Answer 41 · 2017-03-12T13:12:46.000Z

~~One question, if signature conaines []byte, how its length is determined?~~

Scratch that, I just reminded myself how the fuzzing works currently with go-fuzz, it will try longer and longer inputs if it gets a failure for shorter ones.

Answer 42 · 2017-03-13T20:03:56.000Z

Is there some reason this can't be prototyped outside the main repo?

Answer 43 · 2017-03-13T20:36:06.000Z

It seems like the best way forward would be to migrate go-fuzz to be as close as possible to the intended command-line for the standard library. For example, right now you have to do go-fuzz-build and then go-fuzz. It should be go-fuzz [options] [package] just like go test, one step. Also it can add support for non-[]byte fuzzing.

We can even try adding Fuzz functions recognized by go-fuzz into the _test.go files in the main repo, along with testdata/fuzz subdirectories for corpus. Starting to do that will make clear exactly how heavy-weight this might turn out to be. (We don't want to bloat the main repo if this turns out to be a lot of files or bytes.)

I understand we can't do *testing.F this way, but testing.TB is a decent start I think (it gets you logging at least). See my comment above for how to make that work.

Answer 44 · 2017-03-15T13:34:44.000Z

@dvyukov I like your proposal.
Some points need clarification:

-fuzzdir value: system filepath (absolute or relative) or package path? dir implies the former.
-fuzzinput value: filename (relative the corpus directory) or filepath or raw input?

Also I think that some parameters to limit the global fuzzing duration could be useful when running fuzzing in continuous integration build: specify to run the fuzzing function at most n times (go test has -count) or for this duration (-fuzzduration?) and giving multiple such limits would make fuzzing stop when the first limit is reached.

We should also consider the exit code returned by go test -fuzz when limits have been reached. What about returning a special exit code if the corpus has grown (as this requires user attention)?

Update: about -fuzzduration, I see that @FiloSottile suggested -fuzztime above.

Answer 45 · 2017-03-15T13:47:20.000Z

@dvyukov There is a single point in your proposal with which I disagree. It is in this paragraph:

go test runs fuzz functions as unit tests. Fuzz functions are selected with -run flag on par with tests (i.e. all by default). Fuzz functions are executed on all inputs from the corpus and on some amount of newly generated inputs (for 1s in normal mode and for 0.1s in short mode). For that matter, -fuzzdir flag can be specified without -fuzz flag.

I think that running Fuzz functions during go test using the existing corpus would be a great feature and would be a good reason to integrate fuzzing into go test. But I disagree with the idea of fuzzing when the -fuzz flag is not given. go test should keep its deterministic result.

Answer 46 · 2017-03-15T13:57:57.000Z

@rsc go-fuzz-build is currently quite long. Especially because package instrumentation is not cached (no go install for go-fuzz). And it is convenient to be able to stop fuzzing and restart later without enduring the build step again.
Here are some result on one $work project:

$ time go build

real	0m1.798s
user	0m2.200s
sys	0m0.140s
$ time go-fuzz-build $(go list)

real	1m13.105s
user	1m20.688s
sys	0m19.376s

So integrating go-fuzz-build with go-fuzz would not yet be a good step forward.

Answer 47 · 2017-03-15T14:52:32.000Z

Also currently go-fuzz coverage instrumentation doesn't work with CGO builds which happen even when using some functions from stdlib. This might be solvable by epxosing coverage engine used in go test -cover for go-fuzz.

Answer 48 · 2017-03-15T17:33:13.000Z

@Kubuxu Note that the coverage engine of go test -cover is line-based. This is not precise enough for serious fuzzing as all various branches on a single source line are not counted separately. However I don't know if go-fuzz instrumentation is different.

Answer 49 · 2017-03-16T08:43:27.000Z

@dolmen

-fuzzdir value: system filepath (absolute or relative) or package path? dir implies the former.

It says directory, it's the same phrasing that all other go command flags use. How do you propose to change it?

-fuzzinput value: filename (relative the corpus directory) or filepath or raw input?

Added "Flag value specifies path to a file with the input".

Also I think that some parameters to limit the global fuzzing duration could be useful when running fuzzing in continuous integration build: specify to run the fuzzing function at most n times (go test has -count) or for this duration (-fuzzduration?) and giving multiple such limits would make fuzzing stop when the first limit is reached.

What scenario do you have in mind? We did not hit need in such functionality in any of our setups.

We should also consider the exit code returned by go test -fuzz when limits have been reached. What about returning a special exit code if the corpus has grown (as this requires user attention)?

What is the scenario/workflow? Just don't want to over-engineer it. It's easy to add features later, impossible to remove.
Set of things that it may need to communicate can be large (new inputs, new crashes, did not get any coverage, other errors). And in most cases the process is killed, do it does not have a chance to communicate exit status.

Answer 50 · 2017-03-16T08:48:10.000Z

@dolmen

I think that running Fuzz functions during go test using the existing corpus would be a great feature and would be a good reason to integrate fuzzing into go test. But I disagree with the idea of fuzzing when the -fuzz flag is not given. go test should keep its deterministic result.

Good point.
What do others think? I am open to discussion.
The situation that I afraid of is that if it's difficult to enable (and in some cases difficult means adding one flag, because e.g. people may not know about it), then it will be underused.
Lots of people also explicitly expressed interest in using the corpus as base of regression tests and running it on every 'go test' invocation.
We have options of running/non-running random inputs on top of corpus; and maybe changing behavior based on presence of -short flag.

Answer 51 · 2017-03-16T08:48:46.000Z

However I don't know if go-fuzz instrumentation is different.

It's more or less the same as go tool cover.

Answer 52 · 2017-03-23T16:55:00.000Z

Just a small side-point in support of this is that it'll serve as an example and motivator to other languages to treat fuzzing as first-class for those who need the extra convincing.

Answer 53 · 2017-03-27T20:19:41.000Z

To respond to the point about not caching build artifacts, we want to fix that regardless. Assuming that's fixed, it still seems like the right next step is to make 'go-fuzz' the separate command as close to 'go fuzz' the proposed standard command as possible, and to add fuzz tests to at least the x subrepos and maybe the standard library, so that we can understand the implications of having them in the source repos (including how much space we're going to spend on testdata/fuzz corpus directories).

Putting this on hold until go-fuzz is more like the proposed 'go fuzz'.

Answer 54 · 2017-03-28T12:00:06.000Z

@rsc

To respond to the point about not caching build artifacts, we want to fix that regardless.

Please can you link to the relevant issue that is tracking this? I'm interested in following the results of that discussion.

Thanks

Answer 55 · 2017-08-18T00:52:51.000Z

It's unclear what would be the behavior when multiple fuzz functions are selected, so for now we restrict it to only one function. Selecting multiple packages is not supported either. The restriction can be removed when/if we figure out a sane behavior for this.

Speaking from the user viewpoint, not from the implementation viewpoint:

Isn't the sane behavior to just run all that was requested? Likely with some sort of batched fair interleaving, so that the first N seconds of execution gets some exposure for all of them (might not want actual concurrency, it's probably more cache-efficient to target one thing at a time, at least per core).

Now, for the implementation reality: This may not be feasible until the go-fuzz-build step is integrated.

Answer 56 · 2017-08-18T05:23:45.000Z

@tv42 note that other flags (in particular -fuzzdir) most likely need to have different values per package/function.

Answer 57 · 2017-08-18T18:41:34.000Z

@dvyukov Good point, but isn't that sort of true for some other go test flags too? For example, if multiple packages' tests tried to write a profile to the same file. As far as I see, the fix is "well don't do that, then".

Answer 58 · 2017-08-18T18:49:38.000Z

@tv42 but here it applies even to functions within a package.

Answer 59 · 2017-08-18T22:25:45.000Z

@dvyukov You are correct. One way out would be to take the parent dir as the argument, and enforce FuzzXxx naming underneath that.

Answer 60 · 2017-12-03T07:28:17.000Z

Quick summary of an aspect of this to see if I am tracking properly:

@rsc wrote above:

It seems like the best way forward would be to migrate go-fuzz to be as close as possible to the intended command-line for the standard library. For example, right now you have to do go-fuzz-build and then go-fuzz. It should be go-fuzz [options] [package] just like go test, one step.

@dolmen later I think wrote in response:

go-fuzz-build is currently quite long. Especially because package instrumentation is not cached (no go install for go-fuzz). And it is convenient to be able to stop fuzzing and restart later without enduring the build step again. <...snip..> So integrating go-fuzz-build with go-fuzz would not yet be a good step forward.

And then @rsc later replied (I think?) to that, saying:

To respond to the point about not caching build artifacts, we want to fix that regardless

And then there was a follow-up question asking what caching issue that was referencing.

Question 1: Was @rsc (hopefully) referencing the build caching work that he recently landed in 1.10?

Question 2: If so, was this potential work to start prototyping (outside the main repo) make fuzzing a first class feature effectively waiting for that 1.10 build caching work to land? (Recall that back in Feb, @dvyukov had said above "I am ready to dedicate some time to work on parts of this", where that statement I think had helped with the github emoji party on this issue).

Question 3: And if that is correct, now that 1.10 build caching is at least mostly working, is the belief that the 1.10 build caching work should at least in theory help with the performance of re-running the work of the go-fuzz-build step such that it would now be more practical to at least consider prototyping collapsing go-fuzz down to a single step?

Answer 61 · 2017-12-03T08:03:41.000Z

Sorry, a little more.

Regarding one other aspect of this, @rsc wrote above:

We can even try adding Fuzz functions recognized by go-fuzz into the _test.go files in the main repo, along with testdata/fuzz subdirectories for corpus. Starting to do that will make clear exactly how heavy-weight this might turn out to be. (We don't want to bloat the main repo if this turns out to be a lot of files or bytes.)

I'm not 100% sure if @rsc's concern was more about total volume regardless of where the data lives, vs. more about concern about direct impact on the main repo itself.

In case it is the latter, I wanted to briefly highlight that higher up in the comments (in the initial design doc added here), @dvyukov had written that he was at least proposing a separate repo distinct from the main repo, I think:

For the standard library it is proposed to check in corpus into golang.org/x/fuzz repo

Given the concern about file count and size, and given the current go-fuzz already has fuzzed a bunch of the stdlib and has already checked in various corpus into the go-fuzz repo for many stdlib packages, I was curious enough to check to see what the file counts and sizes look like in the go-fuzz repo (including I've previously poked around in there to "borrow" sections of the corpus from the go-fuzz repo to seed some fuzzing of some of our C++ code with the afl fuzzer).

In the "examples" directory of the go-fuzz repo, there are ~70 directories (listed below) that seem to have fuzzing functions (at least based on some greps and other spot checking -- I didn't manually visit every example just now), with many of them having corpus checked into subdirectories.

The first caveat though is that not all of those "examples" directory are for the stdlib (e.g., you can see third-party packages like gorillamux listed below). The second caveat is that some (such as text/template) currently have a fuzzing function but no checked-in corpus.

For the list below, in roughly the middle of the pack you have things like csv with ~250 files totaling ~0.8MB in the csv corpus, vs. the largest listed here is goast, which has ~22MB and ~11K files of goast testdata + corpus.

Finally, hopefully it's not considered too much to paste in this entire list here. Hopefully a quick scan of this is sufficient to get at least a first-pass feel for current sizes of corpus data in go-fuzz repo (assuming I'm not 100% screwing up and/or looking in the wrong spot; if so, I'm hoping someone will correct me).

`go-fuzz` examples directories (fuzzing functions + corpus)

  SIZE      FILES   DIR
  --------  -----   ------------
  0.001 MB      1   html
  0.003 MB      3   pem
  0.004 MB      1   httpserver
  0.004 MB      1   idna
  0.004 MB      1   texttemplate
  0.004 MB      1   ttf
  0.005 MB      2   jsonrpc
  0.005 MB      2   suffixrray
  0.005 MB      2   testcover
  0.008 MB      1   gotypes
  0.008 MB      1   newparser
  0.009 MB      6   test
  0.010 MB      4   aes
  0.010 MB      7   gopacket
  0.010 MB      7   snappy
  0.014 MB      8   time
  0.017 MB     13   elliptic
  0.020 MB      8   flatbuffers
  0.021 MB     17   nss
  0.038 MB     19   httpresp
  0.040 MB     30   freetype
  0.047 MB     33   truetype
  0.110 MB     70   lzw
  0.177 MB     34   bmp
  0.311 MB    191   stdhtml
  0.340 MB    223   flate
  0.423 MB    252   url
  0.426 MB    274   http2
  0.431 MB    132   bzip2
  0.479 MB    330   asn1
  0.538 MB    344   gzip
  0.566 MB    308   zlib
  0.609 MB      2   macho
  0.631 MB    376   mime
  0.634 MB    394   flag
  0.698 MB    457   strings
  0.781 MB    463   websocketclient
  0.794 MB    265   csv
  0.815 MB    513   path
  1.115 MB    227   tar
  1.234 MB    729   bson
  1.271 MB    588   gif
  1.299 MB     71   trace
  1.308 MB    815   xml
  1.316 MB    649   mail
  1.385 MB    232   smtp
  1.396 MB    591   multipart
  1.645 MB    973   protobuf
  1.720 MB    523   tlsclient
  1.943 MB    301   webp
  2.121 MB   1356   parser
  2.121 MB   1356   sqlparser
  2.324 MB    216   tiff
  2.529 MB   1555   fmt
  2.584 MB   1582   gob
  2.812 MB    269   png
  2.815 MB   1094   gorillamux
  2.997 MB   1729   json
  3.042 MB    300   jpeg
  3.860 MB   1784   x509
  4.500 MB   2796   asm
  4.627 MB   2944   gofmt
  4.675 MB   1161   websocketserver
  5.996 MB   3034   tls
  6.008 MB     13   elf
  6.364 MB    421   zip
  6.746 MB   3289   webdav
  7.056 MB   2675   httpreq
  7.845 MB   5423   regexp
  9.242 MB   5725   htmltemplate
 21.710 MB  11647   goast

Answer 62 · 2017-12-03T12:10:51.000Z

Question 2: If so, was this potential work to start prototyping (outside the main repo) make fuzzing a first class feature effectively waiting for that 1.10 build caching work to land?

No, it's waiting on somebody do the work.

Question 3: And if that is correct, now that 1.10 build caching is at least mostly working, is the belief that the 1.10 build caching work should at least in theory help with the performance of re-running the work of the go-fuzz-build step such that it would now be more practical to at least consider prototyping collapsing go-fuzz down to a single step?

I think it should work the same way as testing: 2 separate steps, but by default they are executed together (but if it works better as 2 separate steps in some context, one can always do that).
In general, I think that performance should not play any major role in user interface design. There is no fundamental reason why building for fuzzing is slower than building normal binaries (or race binaries). The current slowness is no more than implementation issue.

Re storing corpus in repo/outside of repo. The implementation must not fix a single location for corpus. There always will be a choice of storing it in repo, in separate repo, somewhere else, or whatever.
Re location of corpus for standard library: I am pretty sure we don't want to store it in main corpus. It will be just too much churn. Given that go command will not fix the location, we can try both options if one sees that storing in repo can work better.

The bottom line: I don't see what is that that we necessarily need to prototype outside of standard library and block the whole thing on it. But I agree prototyping is definitely nice (provided we have infinite developer resources).

Answer 63 · 2017-12-04T06:02:53.000Z

I wonder if some people in the community had reactions to this proposal that included something like:
"Awesome! Love it... but sounds like some deep go internals wizardry would be needed to do anything here."

And on the other hand, I wonder if some of the people with deeper skill sets might be wondering whether or not it makes sense to personally invest time (especially in the face of competing interests) to work on a larger prototype in what might be viewed as speculative effort when there might be questions about the official level of interest in this project getting to the finish line.

In terms of @rsc's suggestion to "migrate go-fuzz to be as close as possible to the intended command-line for the standard library", I wanted to throw out a possible smaller next step for consideration, with a goal of possibly fleshing things out more concretely and gathering data/insight.

Short version: I'm wondering if it might be possible to start with a VERY simple first cut prototype that might not require much changes to go-fuzz and go-fuzz-build to start:

The first cut prototype could largely be a wrapper that presents the next gen UX via the wrapper's CLI, but the wrapper uses exec.Command or similar to invoke unmodified (or maybe mostly unmodified) go-fuzz-build and go-fuzz commands, along with passing through some incoming user commands from the wrapper's CLI to go test.
- In other words, the wrapper would be responsible for getting some meta data and setting things up so that the heavy lifting / actual 'real' work could be done by the already existing tools.
If the first cut prototype is simple enough, it might mean the community could help more in terms of getting the ball rolling...

Answer 64 · 2017-12-04T06:52:37.000Z

And here is a Longer Version (sorry!)

This expands on my immediately prior comment -- all in the context of a VERY basic first cut prototype.

Maybe this doesn't make sense, but in the interests of having a strawman for others to improve upon and/or send alternative ideas, perhaps something like this in terms of chopping things up into some simpler relatively bite-sized chunks for an initial rough cut prototype that tries to avoid changing go-fuzz and go-fuzz-build as much as possible:

Step 0.1 For the stdlib corpus location, it could be a no-op to start.
- In other words, in the interests of getting a lightweight skeleton working end-to-end as simply as possible, in terms of repos, perhaps to start have corpus location be exactly where it is already -- initially in go-fuzz repo (rather than golang.org/x/fuzz or whatever).
Step 0.2 Bring exactly 1 sample Fuzz() func from go-fuzz repo into an actual main repo _test.go file as FuzzFoo() or whatever name
- By itself, that shouldn't actually have any effect (because as @rsc said, 'go test' ignores func names func it finds other than ones starting with Test or Benchmark)
Step 0.3 Identify the minimum set of hopefully very targeted changes that could be applied to the core go-fuzz repo to enable the lightweight first cut wrapper prototype to at least get to a "hello world" fuzz
- With luck, this might be no changes required to go-fuzz or go-fuzz-build commands.
- Or, perhaps it might require some additional command-line flags for go-fuzz-build and/or perhaps some modest tweaks to behavior for go-fuzz-build?
- Some targeted go-fuzz / go-fuzz-build changes that might be needed:
  - Ability to supply a particular fuzz function name like FuzzFoo to go-fuzz-build.
    - Luckily, looks like go-fuzz-build command already has an optional flag named -func to control the fuzzing function name, which defaults to 'Fuzz' but which seems to work when set to other things.
  - Ability for a *_test.go file containing a fuzzing function like FuzzFoo to get picked up by go-fuzz-build.
  - Getting everything properly staged for a subsequent go-fuzz invocation.
  - It could be that if we can make go-fuzz-build happy enough, it might turn out that nothing needs to change for the go-fuzz command invocation.
  - I have a few more notes on this that I will hopefully post later...
Step 0.4 Implement those hopefully minimal changes to the core go-fuzz repo
Step 0.5 One or two people from the broader community could then make 1-2 new and completely independent go-cmd-fuzz-prototype repos (or whatever better name) that have a relatively simple wrapper layer on top of go-fuzz, go-fuzz-build, go test. First cut goals could be:
- Support the next gen UX , perhaps initially focusing on 'run my corpus' and 'fuzz new random cases for me'. Not 100% sure, but I think that might translate to these CLI options:
  - -fuzz regexp (Create new random test cases and repeatedly execute the fuzz function matching this regexp).
  - -fuzzdir dir (Store fuzz artifacts in the specified directory)
  - and a plain 'test' does not create new random test cases, but does execute any existing corpus input files (deterministically and quickly)
  - and the other pre-existing standard test options are passed through to the real 'go test' that gets invoked (though to start some of the less relevant CLI options could start as 'NOT YET IMPLEMENTED' rather than passing them through to 'go test').

Wouldn't need to be that exact order of course...

But maybe something like that would be sufficient to get a rough skeleton working end-to-end? And maybe that could then help with the insight/data gathering?

And once something is working end-to-end, some other bits could get start to get filled in later, such as:

[LATER] start moving a stdlib corpus to golang.org/x/fuzz or wherever.
[LATER] bring some additional sample Fuzz() funcs from go-fuzz repo into some more *_test.go in main repo and/or golang.org/x
[LATER] change from current Fuzz(data []byte) int signature to a testing.TB signature
- To start, rather than changing go-fuzz at the start to support a testing.TB API, instead start with just using a normal panic in the Fuzz() functions similar to what the currently do, which could hopefully be sufficient to get things working end-to-end (e.g., just do things like panic("non-idempotent decoding") or panic("failed to round-trip properly" to start, and to start don't yet implement logging via testing.TB or testing.T)
- Hopefully changing this signature during the prototyping process wouldn't be "terrible"?
  - To start, no one will be using the new Fuzz() API aside from the prototype
  - @rsc had proposed evolving API from testing.TB to *testing.F as part of this process, so starting at Fuzz(data []byte) int signature is a similarly an additional step (rather than using testing.TB to bootstrap things).
  - Requiring using testing.TB at the very start would require more changes to go-fuzz / go-fuzz-build before something would be working end-to-end. It's all possible, of course, but I'm crossing my fingers and hoping for as little changes to go-fuzz repo at the very beginning in order to make it as easy as possible to get the ball rolling...
[LATER] other bells and whistles (like -parallel n, -timeout t, -v, -c/-i, -fuzztime to mirror -benchtime, -short vs. not (e.g., 1 sec vs. 0.1 sec of execution or whatever it ends up being),
[MUCH LATER] Probably wait for the larger 'official' project for things like -fuzzinput, -fuzzminimize, -coverprofile, etc., etc., etc.

Finally, at this point, personally I'm most intrigued to see what could be done to get to a "proposal-accepted" (or rejected) stage. I know full well that there are then larger projects after that, including @dvyukov had outlined 3-4 stages for the real project in the proposal doc above.

Answer 65 · 2017-12-04T07:51:41.000Z

My main question: what aspects do we want to shake out with another prototype?

Answer 66 · 2017-12-04T18:11:10.000Z

[LATER] change from current Fuzz(data []byte) int signature to a testing.TB signature

This can be done in other way, by introducing custom structure in custom go-fuzz and in future aliasing it to testing.F. This is how the x/net/context.Context was migrated.

Answer 67 · 2017-12-24T08:29:47.000Z

One thing that popped up after discussions re OSS-Fuzz integration:
OSS-Fuzz would prefer if a fuzzer binary exits with non-zero status on first bug found. But for manual local runs one would prefer it to continue until explicitly stopped. So potentially we may need a flag for this. Could reuse -count (i.e. "find that many crashes"); or -short ("do a short run")?

Answer 68 · 2018-07-20T08:31:58.000Z

If you want to go with source-to-source transformation outside of go toolchain, we need at least support for such s2s transformations in the go command. It's not possible to create and typecheck C packages outside of go tool. I am not sure what the interface for this should be, because it must not be just a command that accepts a file and produces a modified version of the file (e.g. go build -transform=mytool), the command needs to typecheck the package, including all dependencies and C package and be able to locate all these dependencies (already transformed versions of these dependencies).
On top of that there is constant stream of vendor, internal, modules, etc. It's not possible to do source-to-source transformation of Go code today.
Don't know if such support will be useful for anything else. Maybe. It enables a simple way to do complex and powerful things.

Answer 69 · 2018-07-20T18:46:01.000Z

As someone on a completely different project that actually does implement Go source-to-source transformations on nocgo-only code, having Go toolchain support for this would be really appreciated, but I understand there would be a lot of challenges.

On top of that there is constant stream of vendor, internal, modules, etc. It's not possible to do source-to-source transformation of Go code today.

I know that pain. Add to that the differences of the bazel/blaze/buck Go toolchain (and the differences in Go rule implementations between bazel, blaze, and buck). See this code, which rewrites Go commands to be Go packages (blaze Skylark rules exist, not yet open source as bazel Go rules are very different).

Answer 70 · 2018-07-20T18:53:27.000Z

@hugelgupf can you describe what you are doing with source-to-source transformation? It would be really useful to have more than one real use case when designing such support.

Answer 71 · 2018-07-20T19:59:08.000Z

I should write this up in a readme in our project.

We take multiple Go commands and compile them into one binary, busybox-style.

This means we take Go commands' source and do the following source-to-source transformation:

cmds/foo/foo.go:

package main

import (
  "flag"
  "log"
)

var global = flag.String("name", "", "")

func init() {
  log.Printf("init")
}

func main() {
  log.Printf("main")
}

to

package foo // package name based on directory name or go_binary rule name

import (
  "flag"
  "log"

  "github.com/u-root/u-root/pkg/bb"
)

// Type must inferred from type-checking flag.String.
// This means we must resolve dependencies through vendor, modules, bazel.
var global string

func Init0() {
  log.Printf("init")
}

func Init1() {
  global = flag.String("name", "", "")
}

func Init() {
  // Order of statements determined by types.Info.InitOrder [1]
  Init0()
  Init1()
}

func Main() {
  log.Printf("main")
}

func init() {
  bb.Register("foo", Init, Main) // [2]
}

[1] https://golang.org/pkg/go/types/#Info
[2] https://github.com/u-root/u-root/blob/master/pkg/bb/register.go

The rewritten packages are added as _ imports to this main.go file, which can then be compiled into one binary.

You can then access the command busybox-style through

./bb foo other-args...

or

ln -s bb foo
./foo other-args...

Use case is really space-constrained embedded environments (LinuxBoot).

Answer 72 · 2018-09-25T10:00:08.000Z

Crash in x/net/html #27846
We are also seeing some in compress/flate google/syzkaller#731
Both are potentially remotely-triggerable

Answer 73 · 2018-11-23T12:35:55.000Z

Over the past few days 3 new bugs in go-fuzz were reported:

another issue with cgo
modules not working
source-to-source instrumentation produces broken code on a weird corner case
@Sajmani

Answer 74 · 2018-11-23T13:35:54.000Z

Is there any specific reason why the golang project is not picking this up? In the world of distributed software go has become a go to language because of its properties. I am guessing that because of the very same properties ( managed code, etc ) fuzzing has been neglected. In the world of distributed software DOS can have devastating implications and the only way to build confidence is through fuzzing.

Other than distributed software, at my company we are working on a range of cloud based services, which do DICOM ( medical data formats ) parsing, authentication and a ton more. Currently there is no acceptable way to do integration based fuzzing and the only software, which tries to solve that problem (go-fuzz) is on the shoulders of one guy.

Either decide to pick fuzzing up and have it properly supported in a reasonable amount of time or at least decide that the feature is not going to be supported, so the community can try to work out some other solution.

Keeping this proposal on hold signals to the community that no action is needed.

PS. I had no luck with using AFL's -Q (qemu) feature for fuzzing go binaries and don't know if this would be possible at all.

Answer 75 · 2018-11-24T04:56:24.000Z

@knaxo There is no specific reason. Someone has to do the work. You suggest that the community could perhaps work out some other solution; I would suggest that the community could perhaps implement this solution.

Answer 76 · 2018-11-24T15:11:09.000Z

That was an expected answer and I fully understand and apologize if I sound pushy. I have no idea how you guys assign / pick work.

It would be nice if the golang team deliver proper instrumentation and coverage support, as it requires very specific compiler internals knowledge, in order to execute well. I doubt that we are going to be able to find anybody outside of golang devs that is able to do that work properly. My intuition is that the community will be able to get traction from there and deliver on the rest.

Answer 77 · 2018-11-24T23:34:59.000Z

It's not obvious to me why anything has to be done inside the compiler. The cover tool does not involve the compiler at all. The cover tool itself rewrites the Go code. That seems to me to be the way to go for a fuzzer; I think we would need a clear reason why that is insufficient.

Answer 78 · 2018-11-25T07:27:57.000Z

We don't have necessary support for source-to-source (s2s) transformation tools. The current coverage has custom ad-hoc support both in go tool and in bazel/blaze. It's not possible to mimic all of build system logic and cgo support on the side.
If we decide to go s2s route, we need corresponding support for this in go tool and in bazel/blaze. As I see it an s2s tool should receive sets of Go files that correspond to a single package, including all cgo magic, and be able to reasonably easily parse the files with go/types (this includes importing all depending packages somehow) and finally produce modified files.

Answer 79 · 2018-11-25T09:10:02.000Z

As I see it an s2s tool should receive sets of Go files that correspond to a single package, including all cgo magic, and be able to reasonably easily parse the files with go/types (this includes importing all depending packages somehow) and finally produce modified files.

Someone will have to come up with a standardized way for blaze/bazel/buck ("BBB") to pass type & dependency information to the s2s tool. I think you could conceivably come up with an interface from the s2s tool which can be implemented by both kinds of build systems separately.

The interface would/could probably be as simple as passing in & back out a version of https://godoc.org/golang.org/x/tools/go/packages#Package, which is so far the most comprehensive package information struct I've seen.

I think this could be implemented as a separate project outside of the toolchain, but you'd likely annoyingly always be trailing compiler features. u-root now has s2s transformation support for both the standard Go tool chain and a proposed (bit stale now..) PR for bazel/blaze support at u-root/u-root#927. See e.g. _uroot_rewrite_ast and the corresponding Go tool it passes cmdline information to -- it's a "simple" matter of getting BBB to collect the type and dependency information you need, passing them via cmdline args, and parsing them back out. Then pounding your s2s code into shape such that you can collect the same information from both BBB and the standard Go toolchain go/build stuff. (And yet, we haven't even had time for cgo or Go modules. That's just... a lot of work.)

I guess what I'm saying is... go/types won't be able to implement how blaze works. You have to get BBB to pass the information to you in the first place to get this right.

Answer 80 · 2018-11-25T13:44:43.000Z

@ianlancetaylor

I do not know the degree to which this is still accurate, but some additional concerns listed in the initial draft proposal document include:

However, go-fuzz suffers from several problems:

It breaks multiple times per Go release because it's tied to the way go build works, std lib package structure and dependencies, etc. It broke due to internal packages (multiple times), vendoring (multiple times), changed dependencies in std lib, etc.

It tries to do compiler work regarding coverage instrumentation without compiler help. This leads to build breakages on corner case code; poor performance; suboptimal quality of coverage instrumentation (missed edges).

Considerable difficulty in integrating it into other build systems and non-standard contexts as it uses source pre-processing.

Goal of this proposal is to make fuzzing as easy to use as unit testing.

Answer 81 · 2018-11-25T13:49:06.000Z

Go/packages is intended to enable tools to work independently of build system details, including differences in layout due to GOPATH vs modules, or bazel/blaze/buck. Tools should be able to use go/types in any of those environments as well. Source to source transformation as needed by coverage or fuzzing should be possible; I believe Ian's team is looking at coverage presently. Regarding staffing: I continue to work with my fellow Google managers to staff many new initiatives for Go. Fuzzing is high on my list. But I cannot say when the Go team will be able to do this, as there are many other competing priorities. S

…

On Sun, Nov 25, 2018 at 4:10 AM Chris K ***@***.***> wrote: As I see it an s2s tool should receive sets of Go files that correspond to a single package, including all cgo magic, and be able to reasonably easily parse the files with go/types (this includes importing all depending packages somehow) and finally produce modified files. Someone will have to come up with a standardized way for blaze/bazel/buck ("BBB") to pass type & dependency information to the s2s tool. I think you could conceivably come up with an interface from the s2s tool which can be implemented by both kinds of build systems separately. The interface would/could probably be as simple as passing in & back out a version of https://godoc.org/golang.org/x/tools/go/packages#Package, which is so far the most comprehensive package information struct I've seen. I think this could be implemented as a separate project outside of the toolchain, but you'd likely annoyingly always be trailing compiler features. u-root now has s2s transformation support for both the standard Go tool chain and a proposed (bit stale now..) PR for bazel/blaze support at u-root/u-root#927 <u-root/u-root#927>. See e.g. _uroot_rewrite_ast <https://github.com/u-root/u-root/blob/1ba88836b67a32c8bfc9d587aafbdcab333b85be/build.bzl> and the corresponding Go tool it passes cmdline information to <https://github.com/u-root/u-root/blob/1ba88836b67a32c8bfc9d587aafbdcab333b85be/tools/rewriteast/main.go> -- it's a "simple" matter of getting BBB to collect the type and dependency information you need, passing them via cmdline args, and parsing them back out. Then pounding your s2s code into shape such that you can collect the same information from both BBB and the standard Go toolchain go/build stuff. (And yet, we haven't even had time for cgo or Go modules. That's just... a lot of work.) I guess what I'm saying is... go/types won't be able to implement how blaze works. You have to get BBB to pass the information to you in the first place to get this right. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJSK3XoL9GPbZgKJVDSxUeKi2QzpKz2Vks5uyl6hgaJpZM4MCJif> .

Answer 82 · 2018-11-25T13:53:53.000Z

Backing up, in early 2017, @dvyukov had said above "I am ready to dedicate some time to work on parts of this".

After some discussion, the core Go team asked for a prototype before deciding whether or not to accept the proposal. For example, comments from Russ #19109 (comment), #19109 (comment), and #19109 (comment), including:

it still seems like the right next step is to make 'go-fuzz' the separate command as close to 'go fuzz' the proposed standard command as possible, and to add fuzz tests to at least the x subrepos and maybe the standard library, so that we can understand the implications of having them in the source repos (including how much space we're going to spend on testdata/fuzz corpus directories).

Putting this on hold until go-fuzz is more like the proposed 'go fuzz'.

As a random member of the community, it seems reasonable to ask for a prototype, especially given the care and thought that the Go team has put into things like how 'go test' works.

Another observation is that there is a fair amount of interest in the proposal from the broader Go community. Currently, if you sort by +1 reactions, this issue is ranked number 4 in the open GitHub issue list.

Something that I think could help a prototype move forward faster could be a slightly longer comment from the core Go about what the goals and non-goals of a prototype might be, especially regarding some sense of what might be required in a prototype to reach the point where the Go proposal review team could review a proposal.

That in turn might help different people from the community see how they might be able to help this.

However, I can also imagine it might be difficult for the core Go team to enumerate exactly what should be in a prototype, given I think at least part of the intent of asking for a prototype is to have greater clarity about exactly what is being proposed.

All that said, I will make up some goals. I am sure these will be incomplete or otherwise not reflective of the actual desired goals, but I wanted to throw out a strawman.

Draft Goals for a Prototype

To be done before an evaluation can be made by the Go proposal review committee:

Prototype proposed CLI, including interaction with existing 'go test'.
Add some sample fuzz tests to at least the x subrepos and maybe the standard library.
Start an initial set of corpus directories for the x repos and maybe the standard library (for example, earlier, the proposal suggested "For the standard library it is proposed to check in corpus into golang.org/x/fuzz repo").
Understand how much space is used in corpus directories for x subrepos and/or standard library based on those sample fuzz tests.
Add a new fuzzing signature (or change the existing Fuzz(data []byte) int signature) to work with testing.TB.

Draft Non-Goals for a Prototype

Build 100% of the exact desired compiler-level integration.
Allow the fuzzed function to take a *testing.F for error reporting (and could instead start with using testing.TB instead as suggested by Russ in #19109 (comment) ).

My personal reason for the split between goals/non-goals includes that items 1-5 are more externally visible aspects. Item 6 might be something that could be accepted or rejected at the proposal review stage off a design document and/or perhaps a basic exploratory proof-of-concept. However, that is pure conjecture on my part, and perhaps item 6 is actually considered high risk, and perhaps item 6 is considered an absolute requirement of a prototype prior to review by the proposal committee.

In terms of tapping into the community interest here — items 1-4 are things that are likely within the skillset of a decent-segment of the larger Go community. Item 5 might also be something that could be accomplished without deep knowledge of go-fuzz or Go itself.

Answer 83 · 2019-01-07T09:23:15.000Z

Wanted to briefly share fzgo, a simple work-in-progress prototype of the first few steps listed in the March 2017 proposal document. I've been sitting on the code for a while, including because it's not 100% clear if it helps the proposal discussion or not, but wanted to at least mention its existence and current state.

Mini-demo:

# After installing, start fuzzing by invoking fzgo.
# This example uses a package pattern.
# It automatically builds the instrumented binary (no separate manual prep step).

$ fzgo test github.com/thepudds/fzgo/examples/... -fuzz FuzzTime 
fzgo: building instrumented binary for sample.FuzzTime
fzgo: starting fuzzing
^C

# cd to the package directory and now do not use a package pattern.
# This second run starts quickly because it uses cached instrumentation from the first run.

$ cd $GOPATH/src/github.com/thepudds/fzgo/examples/time
$ fzgo test -fuzz FuzzTime
fzgo: using cached instrumented binary for sample.FuzzTime
fzgo: starting fuzzing

# 'fzgo' wraps the 'go' tool for commands it doesn't process itself. 

$ fzgo test github.com/thepudds/fzgo/fuzz
ok      github.com/thepudds/fzgo/fuzz   0.004s

Overall, this is intended to be a simple initial prototype, with the heavy lifting being done by go-fuzz, go-fuzz-build, and the go tool. You still need to first do go get -u github.com/dvyukov/go-fuzz/....

The current intent of the prototype is just a quick cut at "Step 1" from the immediately prior #19109 (comment):

"Step 1. Prototype proposed CLI, including interaction with existing 'go test'".

The caching is implemented using $GOPATH/pkg/fuzz as the cache location. The caching is coarse grained, but it at least makes it tolerable to try out different command invocations without waiting for the slow instrumentation step every time.

It obviously does not implement the complete proposal, but fzgo currently supports -fuzz regexp, -fuzzdir dir, -fuzztime duration, -parallel n, -timeout duration, -c, -v, and the fuzz and gofuzz build tags.

A few more details are in the README in the repo.

It is still work-in-progress, but wanted to at least mention this...

Answer 84 · 2019-01-07T10:15:50.000Z

FTR, issue about Go fuzzing integration in oss-fuzz:
google/oss-fuzz#36

Answer 85 · 2019-01-07T10:30:03.000Z

This is great! Thanks for doing this, @thepudds.
I see 2 main use-cases. First is one-off-like manual runs, and second is CI-like runs. We need to make sure that both are supported well. OSS-Fuzz is probably the right candidate for testing CI-like integration.

Step 2 in your list is adding some some sample fuzz tests to at least the x subrepos and maybe the standard library. golang.org/x/image looks like the best candidate for this.

Answer 86 · 2019-02-27T19:40:45.000Z

Please keep the "no plus one comments" rule in mind, and express support with an emoji reaction instead. Thanks!

https://github.com/golang/go/wiki/NoPlusOne

Answer 87 · 2019-03-12T13:52:42.000Z

Change https://golang.org/cl/167097 mentions this issue: tiff: add Fuzz function

Answer 88 · 2019-03-21T15:12:56.000Z

Change https://golang.org/cl/168558 mentions this issue: image/png: add Fuzz function

Answer 89 · 2019-04-26T09:59:17.000Z

Change https://golang.org/cl/174058 mentions this issue: encoding/json: add a Fuzz function

Answer 90 · 2019-04-30T11:53:22.000Z

Change https://golang.org/cl/174301 mentions this issue: html: add a Fuzz function

Answer 91 · 2019-04-30T14:06:58.000Z

Change https://golang.org/cl/174302 mentions this issue: encoding/csv: add a Fuzz function

Answer 92 · 2019-05-16T14:27:42.000Z

Fuzzing Evangelism Strike Force has wrote http://tiny.cc/why-go-fuzz if you are still thinking.

Answer 93 · 2019-05-16T21:41:37.000Z

Property-Based Testing Evangelism Strike Force is strongly supporting first class integration of fuzzing.

Property-based testing (with testing/quick being the simplest example) can be seen as generalization of fuzzing from []byte inputs to arbitrary higher-level data structures. It can be argued that it is thus more broadly applicable, as most software does not deal with []byte parsing directly.

Looking at the proposal as an author of property-based testing library (https://github.com/flyingmutant/rapid):

func FuzzFoo(*testing.F, []byte) interface is OK, but not great. While property-based testing library can construct arbitrarily complex data from []byte (if the slice is big enough), there is no way to communicate to fuzzer the structure of data being generated
func FuzzBar(*testing.F, []byte, a string, b int, c float64) extended interface inspired by testing/quick is probably a wrong approach. Tying generation algorithm to type of data is rather limiting, especially in a relatively type-light language like Go. Most modern property-based libraries (like Python's Hypothesis, Haskell's Hedgehog, Clojure's test.check, Rust's proptest) are based on explicitly specified generators and/or interactive random data generation instead.

I believe one of the most useful applications of property-based testing is "stateful" or "state machine" approach (here is how it looks in rapid, here is Hypothesis version), which is well suited to testing complex stateful systems. However, no pre-determined structure can be specified for the data, as the structure can depend on the data being generated (e.g. set of possible actions depends on the current state of the machine).

Thus, I'd like to propose to start with an interface similar to GetRandomData([]byte) from the beginning of the discussion, which later can be extended with means to specify structure:

type T interface {
        GetRandomData([]byte)  // can be called any number of times

        BeginSpan(label uint64) (id int)
        EndSpan(id int, discard bool)
}

This is really close to what Hypothesis or rapid are using internally right now. BeginSpan() and EndSpan() calls are optional; spans can be nested. disacard flag is an optimization for doing rejection sampling (which is used to implementfilter()-ing generators).

I believe there is great value in running property-based testing library on top of high performance coverage-avare fuzzer like go-fuzz, and hope that the chosen interface will be well suited for that.

Answer 94 · 2019-05-16T22:16:24.000Z

@flyingmutant You might be interested in the conversation in dvyukov/go-fuzz#218 and dvyukov/go-fuzz#223, where there is some discussion of prototyping fuzz.F and a corresponding Fuzz function signature. There is also a smaller amount of related discussion on that topic here.

Answer 95 · 2019-09-30T19:45:23.000Z

I really like the gofuzz library. Fuzz function could look something like this:

func FuzzXxx(f *fuzz.Fuzzer) {
    var ints []int
    f.Fuzz(&ints)
    sort.Ints(s)
}

...where f is seeded by the go tool chain. The seed should also be printed on failures so that tests can be reran to reproduce failure scenarios.

Answer 96 · 2019-12-03T14:27:15.000Z

Any news on this?

Answer 97 · 2019-12-03T15:22:58.000Z

There has been some recent work on #14565. There has not been any work on the go tool that I know of. I expect that any such work would be reported here.

Answer 98 · 2019-12-03T15:37:21.000Z

The proposal document has a section at the end that covers recent related work (though the nice recent work by @mdempsky in #14565 to add initial fuzzing coverage instrumentation in the go compiler is not yet mentioned there).

@palsivertsen regarding:

Fuzz function could look something like this:

func FuzzXxx(f *fuzz.Fuzzer) {
var ints []int
f.Fuzz(&ints)
sort.Ints(s)
}

FYI, fuzzing rich signatures is supported by the fzgo tool (which is WIP prototype that follows this #19109 proposal of integrating first-class fuzzing into go test). That ability to fuzz rich signatures gives you similar flexibility to what you are suggesting there, although fzgo uses the coverage guidance from dvyukov/go-fuzz along with storage of interesting results in a corpus, which typically outperforms just randomly generating input based off of a seed (if that is what you are suggesting).

Answer 99 · 2020-02-25T22:55:52.000Z

Following @flyingmutant's response, I agree that fuzzing as a first-class citizen with bring enormous benefit, but it needs to be more than just []byte inputs. When go-fuzz was initially released, I used it successfully to test parsers, but ran into obstacles applying fuzz testing to structured input. While go-fuzz could recognize the structure of JSON (or other repeated-structure textual formats), it could not recognize that I was using JSON as a means-to-an-end, and would spend the majority of its cycles fuzzing the JSON library, not the stateful high-level code I was hoping to test.

I believe there is a lot of opportunity for fuzzing to become a much better successor to testing/quick, perhaps even using source-guided value generation (e.g. of slices of structs, eventually with some field values generated from switch cases in the call-graph) instead of reflection-based APIs for common usecases. If well done, it could replace many, many table driven tests (the kind that's just hand-fuzzing) with simple Fuzz tests.

Conditions could also be statically specified, i.e.

func Fuzz(f *testing.F, a int, b struct { x int }) {
  // after the first *testing.F param, any other parameters may be used,
  // which will have generated values per iteration.

  // f.Require accepts only constant boolean-kinded expressions; toolchain
  // will reject if conditions are too complex or contradictory; multiple
  // f.Require calls are equivalent to a single call with &&'d expressions.
  f.Require(a > 0)
  f.Require(a < b.x && b.x < 1000)

  // ... actual test code that uses a and b
}

Answer 100 · 2020-03-03T08:14:28.000Z

If we do type aware fuzzing, we can also create interesting features like automatic minimization. @flyingmutant does this excellently in his Rapid library.

go-fuzz examples directories (fuzzing functions + corpus)

Draft Goals for a Prototype

Draft Non-Goals for a Prototype

`go-fuzz` examples directories (fuzzing functions + corpus)