sosy-lab/sv-benchmarks

Issue with zilu benchmark

hernanponcedeleon opened this issue · 10 comments

benchmark07_linear has true as expected result.
However, isn't the following a violation?

Initial state

  • i=0
  • n=1
  • k=-3999
  • flag=true

End state

  • k=1
  • n=1

Am I missing something?

I also think many of the benchmarks can result in overflow, thus leading to undefined behaviour. Below some examples

benchmark12_linear

x = 1610614784
y = 805306368
t = 805306368
y=y+x -> overflow

benchmark28_linear

i = 65536
i * i -> overflow

In the later case, the problem is not just the overflow, the following assignment also leads to a violation

i=-1
j=-2

I think you are correct, thanks for the hint!. In the paper they were mainly concerned with (linear) invariants and probably didn't account for overflows.

For benchmark07_linear I think they might have forgotten to constrain k to non-negative values.

That probably explains why none of the verifiers in table 1 in the publication (https://lijiaying.github.io/papers/ase17.pdf) were able to "find the invariant".
Other tasks where this is the case need manual investigation as well I think, these are at least benchmarks 8,20,28,30,52,53.

According to Dartagnan, all the following would result in overflow

benchmark12_linear
benchmark27_linear
benchmark28_linear
benchmark29_linear
benchmark31_disjunctive
benchmark39_conjunctive
benchmark40_polynomial
benchmark44_disjunctive
benchmark45_disjunctive
benchmark46_disjunctive
benchmark47_linear
benchmark48_linear
benchmark50_linear
benchmark53_polynomial

And even if the overflow problem would be solved in benchmark28_linear (also none tool in that paper succeed here), I still think the result is wrong (see my comment above)

Not sure what is the best course of action to solve the benchmarks for the reachability property (solution might be benchmarks independent), but if we don't find a solution before the verification tasks freeze (tomorrow), I think we should at least remove them from the reachability category and add them to the no-overflow category.

I think there is a misunderstanding. Tomorrow is not the final task freezing. You need to have made your PR by tomorrow if you want your new tasks in the competition. PRs that are made later will not be used in the competition, unles the jury votes otherwise. There is time to fix tasks in the repo and in existing PRs until 13th of November:

Nov. 13, 2020 Freezing of verification tasks (pull requests after deadline only merged after jury approval)

ohhhh that's very good to know.
I was panicking because I found many UB (also in other benchmarks) and I was not sure I was gonna be able to propose solutions to all of them before the deadline.

The first deadline is just to prevent people from adding completely new tasks shortly before the freezing. If there are bugs in the already existing tasks, we accept fixes until the deadline on 13th of November. If we find bugs after that, then those tasks are excluded for SV-COMP 2021 on a separate branch, where they will simply be removed. PRs with fixes can of course still be made, but only against master and not on that SV-COMP 2021 branch.
@dbeyer maybe you can confirm that I am not completely wrong here.

CPAchecker actually found even more overflows:

benchmark01_conjunctive.i
benchmark06_conjunctive.i
benchmark07_linear.i
benchmark12_linear.i
benchmark15_conjunctive.i
benchmark21_disjunctive.i
benchmark27_linear.i
benchmark28_linear.i
benchmark29_linear.i
benchmark30_conjunctive.i
benchmark31_disjunctive.i
benchmark39_conjunctive.i
benchmark40_polynomial.i
benchmark42_conjunctive.i
benchmark44_disjunctive.i
benchmark45_disjunctive.i
benchmark46_disjunctive.i
benchmark47_linear.i
benchmark48_linear.i
benchmark49_linear.i
benchmark50_linear.i
benchmark53_polynomial.i

I better get going in fixing all of them 🤕

These require quite different fixes, so for now I just marked them as no-overflow with expected verdict false in the yml files.

Here is the fixes I would perform for the first 5 tasks. Only 2 programs have the same fix. Fixing this would become pretty time-consuming and will probably not happen for this year's competition.

benchmark01_conjunctive.i

at start of while loop body insert:

if (__builtin_add_overflow_p (x, y, x)) break;

benchmark06_conjunctive.i

make all types unsigned int

benchmark07_linear.i

bound variable k to be somewhere inbetween -3999<k<INT_MAX-4000*10

benchmark12_linear.i

at start of while loop body insert:

if (__builtin_add_overflow_p (x, y, x)) break;

benchmark15_conjunctive.i

complete rewrite of the precondition?! It currently is:
if (!(low == 0 && mid >= 1 && high == 2*mid)) return 0;
Instead we would want something like this:

low = 0;
if not (high>=2) return 0;
mid = high/2;
high = high - high %2;

Fixed in #1191 .