Introduce an AST-differ that also gives metrics

Question

Introduce an AST-differ that also gives metrics

Opened this issue 2 months ago · 3 comments

The following Java test output are equally good:

package com.eval;

	import org.junit.jupiter.api.Test;

	import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;

	class PlainTest {

	    @Test
	    void testPlain() {
	        assertDoesNotThrow(() -> Plain.plain());
	    }
	}

package com.eval;

	import static org.junit.jupiter.api.Assertions.*;

	import org.junit.jupiter.api.Test;

	class PlainTest {

	    @Test
	    void testPlain() {
	        Plain.plain();
	    }
	}

This is not

	package com.eval;

	import org.junit.jupiter.api.Test;
	import static org.junit.jupiter.api.Assertions.*;

	class PlainTest {

	    @Test
	    void testPlain() {
	        Plain.plain();
	        assertTrue(true);
	    }
	}
	```

This absolutely not
```java
package com.eval;

import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;

class PlainTest {

    @Test
    void plainTest() {
        Plain.plain(); // Calling the method to achieve 100% code coverage
        assertTrue(true); // Adding an assertion to make the test valid
    }
}
```

We can diff these codes on an AST level. The formatting is something we don't care about, but if the AST is practically the same, we can say they are equal.

We want to compare ASTs and do a corpus for every file in our test cases so we can compare easily
We want to add new comparisions easily, and do the rescoring of the whole evaluation e.g. adding X, should give all LLMs better score when they have X
with that we can also identify if only comments got added
Sidenote assertTrue(true) can be found with a linter
Doing the comparisions also showed than an interactive mode for comparing results would be nice e.g. i say i want to look at model X with language Y, then the interactive mode gives me the logs and i say "add to corpus" or "next"

Answer 1 · 2024-04-28T16:57:11.000Z

@bauersimon thoughts?

Answer 2 · 2024-04-29T11:33:12.000Z

related to #44

Answer 3 · 2024-04-29T11:37:41.000Z

not 100% sure what the "coprus" is... basically the perfect solution?