Tests incorrectly marked as redundant
Closed this issue · 20 comments
In the example GCD program. The only patch that is deemed plausible is
--- gcd.c
+++ gcd.c
@@ -10,7 +10,8 @@
if (a == 0) {
printf("%g\n", b);
}
- {
+return 0;
+ {
while (b != 0) {
if (a > b) {
a = a - b;
However, this is an incorrect patch as all positive tests fail. This positive tests are not run on this patch as they are incorrectly deemed redundant. To truly fix the incorrectly marked redundancy would require the system to recognize that the added line makes code unreachable. Would adding an option to toggle redundancy checks on tests be a valid partial fix?
Using the run_redundant_test
option in #279 causes a single, and different plausible patch to be found:
--- gcd.c
+++ gcd.c
@@ -8,7 +8,8 @@
b = atoi(argv[2]);
if (a == 0) {
- printf("%g\n", b);
+ printf("%g\n", b);return 0;
+
}
{
while (b != 0) {
This patch passes all tests and introduces no problems besides weird formatting and is thus a correct patch. #279 is just a workaround though, not a true fix.
This is interesting. I wonder if this has something to do with the way that I implemented append
. I'll look into this today.
Remind me: What configuration are you using?
version: '1.0'
program:
image: squareslab/tse2012:gcd
language: c
source-directory: /experiment/source
build-instructions:
time-limit: 10
steps:
- gcc gcd.c -o gcd
steps-for-coverage:
- gcc gcd.c -o gcd --coverage
tests:
type: genprog
workdir: /experiment
number-of-failing-tests: 1
number-of-passing-tests: 10
time-limit: 5
seed: 0
threads: 16
localization:
type: spectrum
metric: tarantula
algorithm:
type: exhaustive
coverage:
method:
type: gcov
files-to-instrument:
- gcd.c
transformations:
schemas:
- type: delete-statement
- type: replace-statement
- type: append-statement
optimizations:
ignore-equivalent-insertions: yes
ignore-dead-code: yes
ignore-string-equivalent-snippets: yes
resource-limits:
candidates: 100
For reference, here's the coverage information that I obtain for the example config:
(darjeeling) chris@chris-MS-7B77:~/tools/darjeeling$ darjeeling coverage example/gcd/repair.yml
{
n1 [FAIL]: {
gcd.c: 10..13; 16; 18..28; 30..32; 37; 40..41; 43..44; 47..48; 51
}
p1 [PASS]: {
gcd.c: 16; 18..28; 30..32; 37; 40..41; 43; 47..49; 51; 54; 57
}
p10 [PASS]: {
gcd.c: 16; 18..28; 30..32; 37; 40..41; 43; 47..49; 51; 54; 57
}
p2 [PASS]: {
gcd.c: 16; 18..28; 30..32; 37; 40..41; 43; 47..49; 51; 54; 57
}
p3 [PASS]: {
gcd.c: 16; 18..28; 30..32; 37; 40..41; 43; 47..49; 51; 54; 57
}
p4 [PASS]: {
gcd.c: 16; 18..28; 30..32; 37; 40..41; 43; 47..49; 51; 54; 57
}
p5 [PASS]: {
gcd.c: 16; 18..28; 30..32; 37; 40..41; 43; 47..49; 51; 54; 57
}
p6 [PASS]: {
gcd.c: 16; 18..28; 30..32; 37; 40..41; 43; 47..49; 51; 54; 57
}
p7 [PASS]: {
gcd.c: 16; 18..28; 30..32; 37; 40..41; 43; 47..49; 51; 54; 57
}
p8 [PASS]: {
gcd.c: 16; 18..28; 30..32; 37; 40..41; 43; 47..49; 51; 54; 57
}
p9 [PASS]: {
gcd.c: 16; 18..28; 30..32; 37; 40..41; 43; 47..49; 51; 54; 57
}
}
Interestingly, these numbers don't really make much sense. gcd.c
only has 25 lines. So, if nothing else, it looks like Darjeeling isn't correcting the line numbers to account for the instrumentation that is added to the top of the file.
After fixing an issue with coverage instrumentation, I get the following coverage output:
{
n1 [FAIL]: {
gcd.c: 4; 7..8; 10..11; 14..15; 18
}
p1 [PASS]: {
gcd.c: 4; 7..8; 10; 14..16; 18; 21; 24
}
p10 [PASS]: {
gcd.c: 4; 7..8; 10; 14..16; 18; 21; 24
}
p2 [PASS]: {
gcd.c: 4; 7..8; 10; 14..16; 18; 21; 24
}
p3 [PASS]: {
gcd.c: 4; 7..8; 10; 14..16; 18; 21; 24
}
p4 [PASS]: {
gcd.c: 4; 7..8; 10; 14..16; 18; 21; 24
}
p5 [PASS]: {
gcd.c: 4; 7..8; 10; 14..16; 18; 21; 24
}
p6 [PASS]: {
gcd.c: 4; 7..8; 10; 14..16; 18; 21; 24
}
p7 [PASS]: {
gcd.c: 4; 7..8; 10; 14..16; 18; 21; 24
}
p8 [PASS]: {
gcd.c: 4; 7..8; 10; 14..16; 18; 21; 24
}
p9 [PASS]: {
gcd.c: 4; 7..8; 10; 14..16; 18; 21; 24
}
}
The generated repair without running redundant tests is still incorrect unfortunately:
--- gcd.c
+++ gcd.c
@@ -10,7 +10,8 @@
if (a == 0) {
printf("%g\n", b);
}
- {
+printf("%g\n", a);
+ {
while (b != 0) {
if (a > b) {
a = a - b;
I can confirm that I'm seeing similar incorrect behavior:
--- gcd.c
+++ gcd.c
@@ -10,7 +10,11 @@
if (a == 0) {
printf("%g\n", b);
}
- {
+if (a == 0) {
+ printf("%g\n", b);
+ }
+
+ {
while (b != 0) {
if (a > b) {
a = a - b;
Okay. I think that I've figured out the issue. It does appear to have been introduced by the append-statement
operator. The problem is that the line that is "changed" by the operator is considered to be the line where we do the append. We don't have coverage for that line, because nothing is there. Instead, we should consider the line
of the target statement (i.e., the original if
statement in the code).
Specifically, the issue here is the lines_changed
method in the Candidate
class:
def lines_changed(self) -> List[FileLine]:
"""
Returns a list of source lines that are changed by this candidate
patch.
"""
replacements = \
map(lambda t: t.to_replacement(), self.transformations)
locations = [rep.location for rep in replacements]
return [FileLine(loc.filename, loc.start.line) for loc in locations]
The modified lines are determined by the location of the replacements rather than by the line
property of the transformations themselves. (There are some deeper issues here, too, such as Transformation
assuming that transformations are only associated with a single line.)
I've pushed a fix to the issue. The bad patch is no longer accepted and instead I find a plausible repair. Let me know if this resolves the issue on your end?
It works on my end as well!
I've found this issue to still occur in the zune
example within Darjeeling. The configuration being:
version: '1.0'
program:
image: darjeeling/example:zune
language: c
source-directory: /experiment/source
build-instructions:
time-limit: 15
steps:
- gcc zune.c -o zune
steps-for-coverage:
- gcc zune.c -o zune --coverage
tests:
type: genprog
workdir: /experiment
number-of-failing-tests: 4
number-of-passing-tests: 18
time-limit: 5
seed: 0
threads: 16
localization:
type: spectrum
metric: tarantula
algorithm:
type: exhaustive
coverage:
method:
type: gcov
files-to-instrument:
- zune.c
transformations:
schemas:
- type: delete-statement
- type: replace-statement
- type: append-statement
optimizations:
ignore-equivalent-insertions: yes
ignore-dead-code: yes
ignore-string-equivalent-snippets: yes
resource-limits:
candidates: 100
It produces the following incorrect patch:
--- zune.c
+++ zune.c
@@ -21,5 +21,5 @@
}
printf("current year is %d\n", year);
- return 0;
+
}
When I add run-redundant-tests
it does not find the incorrect patch, but instead finds no patch at all.
I'm having difficulty reproducing this one. I can't find a plausible patch regardless of the value of run-redundant-tests
. Can you share the output of darjeeling coverage repair.yml
?
Oh, I've realized it found the plausible patch due to some tweaks I was making. It's a bit concerning that it can't find it at all though.
Oh, I've realized it found the plausible patch due to some tweaks I was making. It's a bit concerning that it can't find it at all though.
If I recall correctly, this is one of the bugs that Darjeeling can't fix because it requires transformations that are beyond the repair model specified in the example configuration. In that respect, it probably doesn't make for a good example bug 🙂
The original GenProg does manage to sneakily fix the issue because (a) it preprocesses the original source code, and, more importantly, (b) it uses CIL to represent ASTs. CIL performs certain transformations on the program to make analysis much easier. The result is that GenProg actually ends up doing something that's somewhat closer to expression-level repair. (In reality, CIL introduces lots of temporary variables, and the repairs that GenProg can actually make are a bit of a roll of the dice depending on the exact names and ordering of variables.)
Will Darjeeling be able to eventually sneak that fix in as well?
One way or another, Darjeeling should be able to fix that bug soon. The nicest approach, and one most likely to arrive in the next few weeks, is to use a finer grained transformation to fix the bug. Alternatively, it wouldn't be too much work to add an option to Darjeeling that allows for preprocessing of certain C/C++ projects to mimic the behavior of the original GenProg. The problem with that particular approach is that, like the original GenProg, the patches would be generated for the preprocessed codebase and it would be up to the developer to transfer those patches to the original codebase. (With some clever mapping of original/pre-processed code, it might be possible to do that transformation automatically, but I imagine that it isn't easy.)
Interesting. Good to know, thank you for explaining!