rohanpadhye/JQF

"EOF reached" propagates to "Illegal State Exception"

sohah opened this issue · 4 comments

sohah commented

Hi,

Thanks for building and maintaining such a useful tool! I have two quick points:

  1. I think there is a slightly undesirable behavior when the ArbitraryLengthStringGenerator exceeds the MAX_INPUT_SIZE, which results in JQF throwing Illegal State Exception instead of EOF Exception.
    More precisely, this happens because StreamBackedRandom throws an exception here, yet to be caught in InputStreamGenerator here. This results in JQF resuming execution only to crash when the state is corrupted here. This behaviour is a bit unexpected, and would be better to just crash on the first exception, as it is indeed the reason for the problem.

I am attaching the code below to allow for replication of the behavior.

  1. Also, I observed that JQF includes some implementation for ExecutionIndexingGuidance, but it is not turned on by default. However, it sounds to be more accurate than Zest's Guidance with just LinearInput, any reason why this is the case? Was there any performance issues using the ExecutionIndexingGuidance when compared with Zest's Guidance?

Thanks!

@RunWith(JQF.class)
public class HelloJQFTest {

    @Fuzz
    public void fuzzHelloJQF(@From(ObjGenerator.class) List<String> list) throws IllegalArgumentException, IOException {
        (new HelloJQF()).handleRequest(list);
    }
}
public class ObjGenerator extends Generator<List> {
    public ObjGenerator() {
        super(List.class);
    }
    @Override
    public ArrayList<String> generate(SourceOfRandomness r, GenerationStatus status) {
        ArrayList<String> outputs = new ArrayList<>();
        while (r.nextBoolean()) {
            outputs.add(new ArbitraryLengthStringGenerator().generate(r, status));
        }
        return outputs;
    }
}
public class HelloJQF  {
    public void handleRequest(List<String> obj) throws IOException {
        for(String str : obj){
            System.out.println("executing the generated object");
        }
    }
}

Hi @sohah. Thanks for the report and detailed repro with points to code snippets from JQF.

  1. I think the main issue is here is that your generator is creating multiple instances of ArbitraryLengthStringGenerator. The generators based on InputStreams essentially consume every byte of "input" that the Guidance has available until EOF. These generators are expected to be used as a single instance at the top level (i.e., in a @From annotation in a single-argument @Fuzz method) for AFL-style fuzzing, because it guarantees that the input files that JQF saves in the fuzz-results/corpus directory actually represent the ENTIRE input byte-for-byte. These generators cannot be easily composed, e.g. to generate a List<String> as you have in your example. You are bound to get an IllegalStateException if you ask for more bytes after reaching EOF. Perhaps the error reporting can be improved, though, and I'm open to suggestions on how to handle this case. For List<String>, I would insist using the regular StringGenerator or some derivation of it (e.g. we have a DictionaryBackedStringGenerator in the examples module).
  2. Re: ExecutionIndexingGuidance: it is an experimental implementation that sort-of works but has not yet provided us with superior results, mainly because of the run-time overhead of tracking execution indexes during fuzzing. The guidance is not super well documented and no research results have been published so far. Use at your own risk, if you are able to make sense of the code.
sohah commented

Thanks, @rohanpadhye for the reply. I have two points to make regarding our discussion on the InputStream-based generators:

  1. If I understand the code correctly, the InputStreamGenerator, reads lazily, i.e., when a read(), or a readAll() is invoked on the stream for as many bytes as needed. Thus, it isn't really filling all the LinearInput, but rather it does so based on a controlled read. If one invokes readAll(), it will indeed fill all the available space in the LinearInput and EOF will be hit, but if we only do a couple of reads, using read(byte) (here), then this behavior is avoidable. I think this is the intention behind the ArbitraryLengthStringGenerator, which is bounded by the maxSize of 4096, and so if I make the MAX_INPUT_SIZE really large, or dually, make that maxSize smaller, then in principle, we can compose multiple ArbitraryLengthStringGenerator generators together without hitting the EOF. Am I making any false assumptions here?

  2. I do not understand why would we want to catch the EOF exception in the InputStreamGenerator (here). It seems to me that if we hit that, then it is safer to just not continue, after all we are crashing anyway later on because of IllegalStateException. The fix for me would be to remove that catch statement, but perhaps this catch statement needed for some other reason that I am not aware of.

  1. The code snippet you linked reads all bytes up to the max buffer size. The intention is to consume the rest of the stream in its entirety, with a fixed upper bound to prevent fuzzing from getting slowed down by ridiculously large inputs. Perhaps the 4096 number should be configurable instead of hard-coded, but that is the intention of the bound. No more bytes should be consumed after maxSize. The ZestGuidance has a larger bound for MAX_INPUT_SIZE because more bytes may be needed for other types of generators which use some random bytes for control decisions, not just the data to generate (as in strings). However, the InputStreamGenerator is most often used with AFLGuidance, where the guidance only produces fixed-size inputs (and not extensible streams like LinearInput in Zest).
  2. We need a way for the InputStream to signal EOF to its reader, which is done by returning a value of -1 (as per java.io.InputStream API). Let's say the fuzzer generates only a string of 10 bytes, which is much smaller than the MAX_INPUT_SIZE or any other bound, then when JQF attempts to get a new random byte from the SourceOfRandomness it will throw an IllegalStateException (because the randomness source doesn't know about which generator is being used; all it does is signal no more random bytes are available), and we want to convert this to an EOF from the point of view of the InputStream consumer (since this consumer does not know about the underlying tricks implemented by JQF). Note that all of this is most often applied when fuzzing with AFLGuidance, since AFL generates only fixed-size input files (which JQF will use to back the StreamBackedRandom source). You may not hit this condition with Zest, which can extend its LinearInput on demand.
sohah commented

@rohanpadhye , thank you for elaborating on the assumptions in building the InputStreamGenerator and its subclasses, and the need for -1 return value. For now, I am no longer using these generators with ZestGuidance. However, as far as I can see besides the above misleading message, and though they weren't built for that purpose, they seem to be composing just fine when used with ZestGuidance, if MAX_INPUT_SIZE is sufficiently larger than maxSize.
I appreciate your help, I'm closing this issue now.