rohanpadhye/JQF

Lower coverage with higher valid inputs

ihayet opened this issue · 1 comments

Hello. I have run the Maven ModelReaderTest with JQF+Zest multiple times. I noticed that in one case, the valid branch coverage count decreases even if JQF produces a higher number of valid inputs. I am providing the status screen for two cases below where the first case has a relatively lower valid input rate with higher valid branch coverage and the second case has a relatively higher valid input rate with lower branch coverage.

Status screen with lower valid input rate and higher branch coverage:

Semantic Fuzzing with Zest
--------------------------

Test name:            com.mvnXmlApp.mvnXmlApp.ModelReaderTest#testWithGenerator
Instrumentation:      Janala
Results directory:    /Users/...
Elapsed time:         5m 0s (max 5m 0s)
Number of executions: 292,682 (no trial limit)
Valid inputs:         41,019 (14.01%)
Cycles completed:     49
Unique failures:      0
Queue size:           34 (9 favored last cycle)
Current parent input: 14 (favored) {39/1000 mutations}
Execution speed:      492/sec now | 975/sec overall
Total coverage:       29 branches (0.04% of map)
Valid coverage:       28 branches (0.04% of map)

Status screen with higher valid input rate and lower branch coverage:

Semantic Fuzzing with Zest
--------------------------

Test name:            com.mvnXmlApp.mvnXmlApp.ModelReaderTest#testWithGenerator
Instrumentation:      Janala
Results directory:    /Users/...
Elapsed time:         5m 0s (max 5m 0s)
Number of executions: 273,762 (no trial limit)
Valid inputs:         61,663 (22.52%)
Cycles completed:     43
Unique failures:      0
Queue size:           37 (10 favored last cycle)
Current parent input: 13 (not favored) {18/44 mutations}
Execution speed:      1,093/sec now | 912/sec overall
Total coverage:       28 branches (0.04% of map)
Valid coverage:       27 branches (0.04% of map)

What could be the possible reasons for the valid branch coverage to decrease even with a higher valid input rate? Any help will be highly appreciated. Thanks in advance.

It's entirely possible that a fuzzing run that generates a valid input with some hard-to-solve condition results in more coverage but lesser valid rate overall. There is no need for a correlation between the two. The main reason for showing the valid rate is to identify if the fuzz driver / test method is too overtly strict or not. So, if you see a valid rate of 0-5% or so then it means the assumptions are just too strong and fuzzing won't be effective. But once you get past a certain threshold, it's possible that the fuzzer generates lots of valid inputs that cover the same part of the implementation, and that uncovering a complex branch leads to more invalidity as it requires satisfying multiple constraints in the input.