sosy-lab/sv-benchmarks

Inappropriate benchmarks in MemSafety

versokova opened this issue · 15 comments

Most of benchmarks in MemSafety-TerminCrafted do nothing related to memsafety (dereference, free, memory tracking). What do you think? I suggest removing them from the category.

termination-crafted/2Nested-1.c
termination-crafted/4NestedWith3Variables-1.c
termination-crafted/Ackermann-2.c
termination-crafted/Bangalore_v2.c
termination-crafted/Bangalore_v4.c
termination-crafted/Bangalore-2.c
termination-crafted/Benghazi_nondet-2.c
termination-crafted/Benghazi.c
termination-crafted/Binary_Search-2.c
termination-crafted/Cairo_step2-1.c
termination-crafted/Cairo_step2-3.c
termination-crafted/Cairo.c
termination-crafted/Copenhagen.c
termination-crafted/Copenhagen_disj-2.c
termination-crafted/Division-2.c
termination-crafted/Gothenburg_v2-1.c
termination-crafted/Gothenburg-1.c
termination-crafted/Lobnya-Boolean-Reordered-2.c
termination-crafted/Madrid.c
termination-crafted/McCarthy91_Iteration.c
termination-crafted/McCarthy91_Recursion.c
termination-crafted/MenloPark.c
termination-crafted/MutualRecursion_1a.c
termination-crafted/MutualRecursion_1b.c
termination-crafted/Mysore-2.c
termination-crafted/Mysore-3.c
termination-crafted/NestedRecursion_1a-2.c
termination-crafted/NestedRecursion_1b.c
termination-crafted/NestedRecursion_1c.c
termination-crafted/NestedRecursion_1d.c
termination-crafted/NestedRecursion_2b.c
termination-crafted/NestedRecursion_2c.c
termination-crafted/NonTerminationSimple7.c
termination-crafted/Nyala-2lex-2.c
termination-crafted/Parallel.c
termination-crafted/Piecewise.c
termination-crafted/Pure2Phase-1.c
termination-crafted/Pure3Phase-1.c
termination-crafted/RecursiveMultiplication-1.c
termination-crafted/RecursiveNonterminating-1.c
termination-crafted/Rotation180-1.c
termination-crafted/Singapore-2.c
termination-crafted/Stockholm-1.c
termination-crafted/TelAviv-Amir-Minimum.c
termination-crafted/Thun-2.c
termination-crafted/Toulouse-BranchesToLoop-2.c
termination-crafted/Toulouse-MultiBranchesToLoop-2.c
termination-crafted/Waldkirch.c
termination-crafted/WhileFalse.c
termination-crafted/WhileTrue.c
termination-crafted/aaron2-1.c
termination-crafted/aaron3-1.c
termination-crafted/easy1.c
termination-crafted/easy2-2.c

The rest of benchmarks (11 files) are reasonable for memsafety.

I agree with this, the MemSafety category should contain only benchmarks that somehow manipulate memory - at least dereference a pointer, access array item, etc. The mentioned benchmarks do not satisfy this requirement.

@dbeyer can you please elaborate on this? I suggest filtering these benchmarks and adding only those manipulating memory to MemSafety-TerminCrafted (or, since there are only 11 such benchmarks, merging them into another sub-category of MemSafety).

The mentioned files do not contain any memory safety violation, i.e., the verdict for valid-memsafety.prp is always true. So by removing them we would not lose much on first glance.

On the other hand these should also not take long to verify, and it is also good to have files. A tool could have problems with proving a non-terminating program to be safe regarding valid-memsafety.prp, so the tasks are not without use.

Anyway, I think this is a misunderstanding: MemSafety-TerminCrafted is not a subcategory of SV-COMP's MemSafety category, and I do not think @dbeyer has any plans to add them. So I guess this issue can be closed?

@MartinSpiessl I guess not. Why do you think it is not a part of SV-COMP'20? Do you have any information that the jury does not know about? The MemSafety benchmarks are generated incorrectly on the website https://sv-comp.sosy-lab.org/2020/benchmarks.php. And this commit speaks for itself.

A tool could have problems with proving a non-terminating program to be safe regarding valid-memsafety.prp, so the tasks are not without use.

You are right. Then the description MemSafety-TerminCrafted.cfg isn't correct.

Why do you think it is not a part of SV-COMP'20? Do you have any information that the jury does not know about? The MemSafety benchmarks are generated incorrectly on the website https://sv-comp.sosy-lab.org/2020/benchmarks.php. And this commit speaks for itself.

Thanks for the hint, I must have overlooked that. I looked up the benchmark definitions and though MemSafety-TerminCrafted was not inside, but you are obviously right.

So to summarize:
So we just correct the description in MemSafety-TerminCrafted.cfg after which this issue can be closed. Or is there something else to discuss?
EDIT: Oh and of course https://sv-comp.sosy-lab.org/2020/benchmarks.php needs to be updated

But your argument is still absurd. So you basically said I have to add benchmarks with complex data structures to the Termination category. You do not have any access to the memory in these benchmarks.

Actually, if MemSafety-TerminCrafted is not to be a sub-category of the MemSafety category, then it would make sense to rename it so that this misunderstanding doesn't happen again. What about Termination-MemSafetyCrafted since, if I understand correctly, this should be a sub-category of the Termination category?

But your argument is still absurd.

You did not indicate this in your first answer, I thought you agreed:

You are right. Then the description MemSafety-TerminCrafted.cfg isn't correct.

What does this have to do with the Termination category?

So you basically said I have to add benchmarks with complex data structures to the Termination category.

I never said this. You can do this, sure, and there are reasons why this might indeed be useful, that is what you can deduce from what I said. In real-world software, there is no strict separation of features, so adding tasks that combine features will make SV-COMP more relevant for real-world applications. Adding tasks with complex data structures for termination does indeed make a lot of sense, e.g. if the termination depends on these complex data structures. Even if the complex data structures are not relevant for the termination, this will still show whether the tool can figure that out.

Actually, if MemSafety-TerminCrafted is not to be a sub-category of the MemSafety category, then it would make sense to rename it so that this misunderstanding doesn't happen again. What about Termination-MemSafetyCrafted since, if I understand correctly, this should be a sub-category of the Termination category?

As far as I can tell, MemSafety-TerminCrafted is intended to be a sub-category of MemSafety, but I guess we will have to wait for an answer from @dbeyer for this. These are files from termination tasks, which also have a true memsafety verdict added. So in principle, it makes sense to also add them to the MemSafety category. SV-COMP will also not get worse by adding those tasks, it will just consume some more resources. But tools should be fast in proving these tasks correct (so there will be minimal resource usage), and if they are not, then this is evidence that the tasks are indeed interesting.

You can be right and it can be absurd at the same time. You can verify every task for all properties (but why?). If task doesn't manipulate memory, then it is memsafety. If task doesn't contain __VERIFIER_error(), then it is reachsafety. If task doesn't use integers, then there isn't overflow. etc

I never said this.

I apologize. I deduced from what you said.

I just skimmed through the discussion. In general, I do not like the idea of adding tasks aiming at X into category Y. A brief comment: It seems to me that if we adopt the spirit that having a benchmark that does not have anything related with memsafety in memsafety, then we can put any task into any category. Equivalently, we can completely cancel the categories and subcategories and have just a flat benchmark. However, then you completely loose information about which tool/approach is good at what and bad at what. Conversely, if a tool that is good at X heavily looses in the category devoted to X just because some tasks that are completely and utterly unrelated with X were added, what does this say to anybody? I do not like this.

I believe we have the "SoftwareSystems" as a category where all sorts of properties could be relevant and arbitrary source code features may appear. For all other categories, however, I'd hope we stick with what @vojnar said.

@versokova Could you please create a pull request that removes the files that you mentioned from the category MemSafety-TerminCrafted?

Since you cannot remove the tasks from the directory (because the directory is used by other categories), you need to change MemSafety-TerminCrafted.set such that it explicitly enumerates the remaining 11 tasks.

I can confirm that MemSafety-TerminCrafted is a sub-category of MemSafety,

I can also confirm that all those tasks in directory termination-crafted/ are part of several other categories, e.g., Termination and NoOverflows.

I do not think there is a contradiction in the discussion above.

@MartinSpiessl is completely right when he says that it does not hurt in principle if those tasks are part of MemSafety because your verifier should be quick in proving them.

@versokova is also completely right when she does not talk about absurdness or arguments, but rather that those tasks should be removed from the category because they might blur the picture because they just add noise. For that latter reason, I would remove them from the category (also in accordance with @vojnar's comment).

I believe all is discussed, solutions are found, and I am waiting for pull requests to make the fixes, if still desired.