integrated fuzz testing
Closed this issue ยท 20 comments
Make it so that unit tests can ask for fuzz input:
test "foo" {
const input_bytes = std.testing.fuzzInput(.{});
try std.testing.expect(!std.mem.eql(u8, "canyoufindme", input_bytes));
}
Introduce flags to the compiler: -ffuzz
, -fno-fuzz
. These end up passing -fsanitize=fuzzer-no-link
to Clang for C/C++ files. Introduce build system equivalent API.
However, neither the CLI interface nor the build system interface is needed in order to enable fuzzing. The only thing that is needed is to ask for fuzz input in unit tests, as in the above example.
When the build runner interacts with the test runner, it learns which tests, if any, are fuzz tests. Then when unit tests pass, it moves on to fuzz testing, by providing our own implementation of the genetic algorithms that drive the input bytes (similar to libFuzzer or AFL), and re-compiling the unit test binary with -ffuzz
enabled.
Fuzz testing is level-driven so we will need some CLI to operate those options. For example, zig build --fuzz
might start fuzzing indefinitely, while zig build --fuzz=300s
declares success after fuzzing for five minutes. When fuzz testing is not requested, it defaults to a small number of iterations just to smoke test that it's all working.
Some sort of UI would be nice. For starters this could just be std.Progress
. In the future perhaps there could be a live-updating HTML page to visualize progress and code coverage in realtime. How cool would it be to watch source code turn from red to green live as the fuzzer finds new branches?
I think there's value in being able to fuzz test a mix of Zig and C/C++ source code, so let's start with evaluating LLVM's instrumentation and perhaps being compatible with it, or at least supporting it. First step is to implement the support library in Zig.
-ffuzz
will be made available as a comptime flag in @import("builtin")
so that it can be used, for example, to choose the naive implementation of std.mem.eql
which helps the fuzzer to find interesting branches.
Comments are welcome. Note this is an enhancement not a proposal. The question is not "whether?" but "how?".
Related:
Note that fuzz testing benefits a lot from starting with an input corpus of short, unique, and relevant inputs:
So Zig will likely want to have a way to provide the build/test runner with seed inputs as well. Imaginary syntax:
test "foo" {
std.testing.fuzzCorpus(&.{
@embedFile("inputs/input01"),
@embedFile("inputs/input02"),
});
const input_bytes = std.testing.fuzzInput();
try std.testing.expect(!std.mem.eql(u8, "canyoufindme", input_bytes));
}
But this may not be ideal since it's part of the test code itself. Separating things out into some sort of separate "setup the fuzzer" + "provide a function to repeatedly call" might be worthwhile.
(side note: dictionaries can also be helpful and would similarly ideally be provided in some sort of setup phase)
FWIW here's what Go's integrated fuzz testing looks like:
func FuzzReverse(f *testing.F) {
testcases := []string{"Hello, world", " ", "!12345"}
for _, tc := range testcases {
f.Add(tc) // Use f.Add to provide a seed corpus
}
f.Fuzz(func(t *testing.T, orig string) {
// ...
})
}
When the build runner interacts with the test runner, it learns which tests, if any, are fuzz tests.
How does this mechanism work? If you've not thought about this yet, as a random (possibly bad) idea: perhaps std.testing.fuzzInput
returns error{NeedFuzz}![]const u8
or similar, and if -ffuzz
is not provided, it just returns error.NeedFuzz
, which the caller propagates with try
and the test runner can then report to the build runner.
example of a repo that does it today (with afl integration) (with separate zig build test
and zig build fuzz
steps)
https://github.com/nektro/zig-json
How does this mechanism work?
The build runner already runs the test runner as a child process with the test runner protocol over stdio, so that it can keep running unit tests when one of them crashes the process, and check that a unit test triggered a safety panic as expected (#1356). It also makes the parent process know which test was being executed if the unit test crashes the process.
Doing this over stdio is super handy because it even works in strange environments such as via QEMU, wine, or wasmtime.
The function can set a flag indicating that a fuzz test was encountered, then return random bytes (smoke test). Before the test runner sends EOF to the parent process it will send a message indicating metadata about the fuzz tests in the compilation. The build runner then has all the information it needs to enter Fuzz Mode after the main build pipeline is done.
That makes sense -- nicely designed.
Here's a tangentially related question. Like other parts of the compiler, our testing infrastructure is moving towards a strong bias to running via the build system. Is there, perhaps, an argument to be made for renaming zig test
to zig build-test
and maybe even eliminating the non-compiler-protocol test runner functionality? The standalone command provides a worse UX, but its name can kind of indicate to people that it's "the way" to test their code; this often leads to people doing incorrect things like trying to zig test
individual files within a project (when the correct thing is to test their entire project with a test filter set).
This fuzzing stuff is another example of very tight integration between the build system and compiler, where directly running zig test
would at the very least provide a worse UX. (I don't quite understand what the -ffuzz
option is intended to do to Zig code, if anything, so don't have a solid grasp of whether it would work at all; does the test runner or the build runner provide the fuzzing inputs?)
Perhaps this is a silly idea; but if you think it has some merit, I'll spin it off into a separate proposal.
This would also apply to build-exe
, build-lib
, build-obj
, translate-c
, and objcopy
. I think there is value in supporting both workflows; the simplicity of using the lower-level commands is quite handy when troubleshooting. I think it's fine if people use zig test
to test a single file, as long as it works, but of course the build system is there for managing more complex invocations as well as multiplexing.
The fuzz tests in zig test
mode would still run but would only do 1 iteration each, with (probably useless) random input. Perfect for writing the fuzz test before you actually want to give it a spin with zig build --fuzz
, and for noticing when you broke it.
To answer your question about -ffuzz
, it enables instrumentation in the generated code so that the fuzzer gets feedback on the branches that were taken based on its generated input. This helps it search the state space much more efficiently. The idea here is that there would be two builds of the unit tests - one without this instrumentation for unit tests, and one with the instrumentation that also links in the support library code, for doing fuzz testing.
Edit: now that I think about it, I don't think it would be that hard to make -ffuzz
work in combination with zig test
as well, although my driving motivation is still the all-powerful zig build
integration.
The fuzz tests in
zig test
mode would still run but would only do 1 iteration each, with (probably useless) random input. Perfect for writing the fuzz test before you actually want to give it a spin withzig build --fuzz
, and for noticing when you broke it.
IMO the ideal would be that in zig test
mode it would run the test once with each of the provided input corpus. For a well constructed corpus, this would actually test many different code paths (while being finite + quick).
However, I can't really think of a way to make defining an input corpus work with Zig's current test syntax, so a proof-of-concept that always fuzzes starting with an empty input is probably the way to go.
Some sort of UI would be nice. For starters this could just be std.Progress. In the future perhaps there could be a live-updating HTML page to visualize progress and code coverage in realtime. How cool would it be to watch source code turn from red to green live as the fuzzer finds new branches?
Fuzzing often is done in a distributed manner: ten machines simultaneously running the fuzzer. To enable these kind of use-cases, it would be useful to access the results from the build system. Eg, fuzz step could produce a report in JSON file, which you then can use as an input to โCreateGitHubIssueStepโ or some such.
Here's a half-baked idea that maybe somebody could turn into something workable: have a mechanism to ensure that fuzzing hits a certain line of code and that shows a failure otherwise.
Here's a half-baked idea that maybe somebody could turn into something workable: have a mechanism to ensure that fuzzing hits a certain line of code and that shows a failure otherwise.
Sounds related to sometimes assertions.
The fuzz tests in
zig test
mode would still run but would only do 1 iteration each, with (probably useless) random input. Perfect for writing the fuzz test before you actually want to give it a spin withzig build --fuzz
, and for noticing when you broke it.IMO the ideal would be that in
zig test
mode it would run the test once with each of the provided input corpus. For a well constructed corpus, this would actually test many different code paths (while being finite + quick).However, I can't really think of a way to make defining an input corpus work with Zig's current test syntax, so a proof-of-concept that always fuzzes starting with an empty input is probably the way to go.
Instead of specifying a corpus in Zig code, what about providing it to the test/build runner on the CLI? Could we have --fuzz
take an optional argument specifying a corpus directory? Since different tests presumably will want different corpuses we'd need some mechanism of associating different input files with tests - maybe something simple like sub-directories named by the fully qualified name of a test would work.
When fuzzing with AFLPlusPlus I often have updated my corpus with new seed files from a previous fuzzing run so that the next run doesn't have to re-explore the same search space from scratch. For this reason, I think it would make more sense for the input corpus to not be specified in the code. With a CLI flag, the build-runner could even be made to automatically update the corpus with new seeds if desired.
Instead of specifying a corpus in Zig code, what about providing it to the test/build runner on the CLI?
Depends what the intended use cases are. From the OP, it sounds like running multiple fuzz tests (for a finite amount of time each) is an intended use case, so specifying a corpus for each fuzz test via the CLI might be a bit tricky. Reading from some particular location based on the fully qualified test name would work but would make renaming/moving tests around a chore (and a potential footgun-of-sorts if you don't realize there's a mismatch in the corpus/test FQN).
Existing languages have a lot of magic re: how fuzzing targets are defined. For example, Go requires targets to:
- Be contained in a
_test.go
file. - Have its function name start with
Fuzz
. - Be defined as a void function with a
*testing.F
as its only parameter. - Use only a small list of built-in types for tests.
Fuzzing in Rust via cargo-fuzz is better in that:
- Targets are defined using the
fuzz_target
macro. - Custom types can be created via the
arbitrary
crate.
This is how the glob-match
crate is fuzzed, something I missed when I made a Zig port of it.
IMO I believe we could get the best of both worlds by having a fuzz
block, similar to the existing test
blocks, along with a seperate std.fuzz
namespace for e.g. adding data to a corpus, creating arbitrary types.
Instead of specifying a corpus in Zig code, what about providing it to the test/build runner on the CLI?
Depends what the intended use cases are. From the OP, it sounds like running multiple fuzz tests (for a finite amount of time each) is an intended use case, so specifying a corpus for each fuzz test via the CLI might be a bit tricky. Reading from some particular location based on the fully qualified test name would work but would make renaming/moving tests around a chore (and a potential footgun-of-sorts if you don't realize there's a mismatch in the corpus/test FQN).
With the plan of a two-pass system where the first pass detects which tests are fuzz tests, perhaps we can have a std.testing.fuzzCorpusDir("path/to/corpus/directory")
which is called in a test and used in the first pass to register a corpus directory for that test and that information is relayed back to the build runner for use when compiling in fuzz mode. std.testing.fuzzCorpusDir
would be a no-op when compiled with fuzzing active.
For those who know more about fuzzers and instrumentation: how hard would it be to make this generic enough and make integrations into different instrumentation/fuzzing libraries? Letting you plug fizzing engines or, for example, if I was making a zig library for a different language and they used a specific fuzzer and I wanted to fuzz the calls to zig using the same system (getting coverage etc.). Almost like having "custom fuzz runners + integration" the same way we can have custom build and test runners?
The fuzz tests in zig test mode would still run but would only do 1 iteration each, with (probably useless) random input.
Non-deterministic CI failures ahoy!
After fuzzing in a lot of different projects, I like the interface in go. In test mode just run the provided inputs and in fuzz mode use those inputs to seed the corpus.
Many fuzzing tools also have a corpus minimization option which produces the minimum set of inputs that obtain the same coverage as the full corpus. I like to copy those back into the fuzz test to get good coverage in test mode.
Non-deterministic CI failures ahoy!
Some minor quality-of-life options from other tools:
- Set a timeout after which the fuzz test will be killed and the process restarted.
- Choose whether timing out is considered a fail or a pass (eg timing out when fuzzing an interpreter is expected).
- Try to detect unique failures by recording basic program state (eg a honggfuzz failure looks like
SIGSEGV.PC.5555556f9ec4.STACK.19f217c3bb.CODE.1.ADDR.7fffff7fed70.INSTR.mov____%r8d,-0x320(%rbp).fuzz
- any failures with matching values will be reported as duplicates, and by default only the first failure and the smallest failure will be reported). This is invaluable if you have some unfixable bugs but still want to fuzz for new bugs. - As @matklad mentioned above, have some way to export and merge coverage reports. Useful for monitoring (it's quite easy to accidentally break fuzzer coverage and not notice) and for additional tools like 'sometimes asserts'.
- Set a timeout after which the fuzz test will be killed and the process restarted.
- Choose whether timing out is considered a fail or a pass (eg timing out when fuzzing an interpreter is expected).
As someone with a lot of experience in fuzzing in Go, I can't stress enough how important it is. Without them, continuous fuzzing is essentially broken in Go: golang/go#48157, golang/go#56238, golang/go#52569
Non-deterministic CI failures ahoy!
We do something at TigerBeetle here. I am not sure if what we do is brilliant or cursed. What we do is that we use commit sha as a seed for "run fuzz tests once on CI" check:
- this gives you deterministic results, where you don't have to fiddle with CI logs to fish out the seed, because knowing commit hash is enough
- but it still avoids the pitwal of using a single seed and than, eg, always going to one branch of swarm testing