rust-fuzz/libfuzzer

Check min size needed with `Arbitrary::size_hint` and early exit if the data isn't long enough

fitzgen opened this issue · 8 comments

@bnjbvr was reporting that starting fuzzing from scratch with a fuzz target that takes an Arbtirary impl was spending a lot of time on three bytes long inputs, where the Arbitrary implementation required more input bytes than given. The fuzzer wasn't giving more input bytes fast enough to get to the meaty bits of the fuzz target.

In theory, we could use Arbitrary::size_hint to create a bunch of random seeds for the corpus of the appropriate size and/or control the maximum length of inputs that libfuzzer will generate.

I'm not sure exactly what this would look like, but it probably requires another env var dance between cargo fuzz and libfuzzer-sys like how the debug printing works.

As far as how this is exposed to users, maybe we should add cargo fuzz seed <target> as a new subcommand?

Or maybe we can just document that users can do something like head -c 100 /dev/random > fuzz/corpus/my-target/my-random-seed to add a random 100 bytes seed to their fuzz target's corpus to get things off the ground...

it feels like libfuzzer should have an option for this? if not perhaps we should ask for one?

What are you imagining that libfuzzer would do? It doesn't know whether it provided enough bytes for Arbitrary or not. I'm not sure what you mean here.

Hello! I'm new to cargo-fuzz but I've encountered similar issues when implementing the Zest algorithm in JQF (a structured-input fuzzer for Java). The solution for this problem was to extend the input on-demand when the Quickcheck generators (analogous to Arbitrary) request more bytes, and then cap this extension to some max limit. The relevant source code is here: https://github.com/rohanpadhye/jqf/blob/aeff24a79e409f6ab2c1b8a5c987dbafb091461d/fuzz/src/main/java/edu/berkeley/cs/jqf/fuzz/ei/ZestGuidance.java#L1068-L1099

I don't know much about libfuzzer internals, but it would be good if a test driver could actually modify and extend the byte array that libfuzzer provides during test execution, so that cargo-fuzz (or libfuzzer-sys) could just populate any extra bytes with freshly generated random values on demand. That would be such a useful feature for libfuzzer to have in many different scenarios other than powering Arbitrary...

@fitzgen ./fuzz --min-len=foo, and a way to pass that down. This would require executing the fuzz target with another env var to get the size hint.

@rohanpadhye Worth proposing to libFuzzer. I suspect that given the general fuzzer focus on fuzzer-driven mutation this may not actually work that well, but worth a shot either way.

Thanks for the insights @rohanpadhye!

The solution for this problem was to extend the input on-demand when the Quickcheck generators (analogous to Arbitrary) request more bytes, and then cap this extension to some max limit. The relevant source code is here: https://github.com/rohanpadhye/jqf/blob/aeff24a79e409f6ab2c1b8a5c987dbafb091461d/fuzz/src/main/java/edu/berkeley/cs/jqf/fuzz/ei/ZestGuidance.java#L1068-L1099

@Manishearth, this reminds me of how during the redesign of Unstructured, we debated whether to error early when running out of bytes or to artificially extend it with zero bytes. We thought that exiting early would avoid confusing the fuzzer, and keep it in control of exploring the input space. Maybe this was the wrong choice?

We could fairly easily start returning zeros after we've exhausted the input (and have a max limit on how many zeros we return just in case something gets in a loop).

But if we were to actually pursue this, first I would want to take a real world fuzz target (or multiple targets) and test how much code coverage we get after X period of time with and without the zero extending, when starting from an empty corpus. Unfortunately, I don't have the cycles to run the experiment...

Anyways, I'll file an issue for a -min_len flag for libfuzzer.

Oh another idea is we could do something like this:

if data.len() < Arbitrary::<FuzzTargetType>::size_hint().0 {
    return;
}

...

in the libfuzzer_sys::fuzz_target! macro. That would at least cut down on the reported code covered inside Arbitrary impls and also be a bit faster.

In fact, I think that would be the best way to handle this.