Evaluation task: Code repair
ruiAzevedo19 opened this issue · 0 comments
ruiAzevedo19 commented
Goal
Given source code with compilation errors, the model needs to repair the code such that the source code compiles. The response is validated by executing predefined tests making sure that the implementation itself is not altered.
PRs
Follow-up
TODOs
- Testdata
- Examples
- function opening brackets are missing
- type is missing
- type is wrong
- import is missing
- variable is not declared
- For each case:
- generate test with
symflower unit-tests
- check the tests are passing
- add a mistake to the implementation
- commit
- generate test with
- Examples
- Implementation
- Define a new task identifier:
code-repair
- For
symflower
model define this task as unsupported because we always generate deterministic tests - For LLM models
- Define the new task as supported
- Create an interface for tasks
- Interface:
Task
- Methods
-
Run(repository) (assessment, err)
: run the task for the given repository and return the assessments -
Identifier
: returns the task identifier
-
- Interface:
- Define tasks
-
TaskWriteTests
- The
Run
method is basically what we already have inevaluate/repository.go:Evaluate
- Remove
evaluate/repository.go:Evaluate
since is now part of the task
- The
-
TaskCodeRepair
- The
Run
method is responsible to only run the task for source code files (filter out test files and other files)- The method must range over the sub-directories in
mistakes
testdata and and run the code repair task for each sub-directory - Add two methods to the language interface
-
DefaultFileExtension
returns the language file extension -
DefaultTestFileSuffix
returns the language test file suffix, i.e.,_test.go
for Go andTest.java
for Java - Note: this will be used to easily filter out files
-
- The method must range over the sub-directories in
- The
-
- Calling the
Run
method- replace the call
temporaryRepository.Evaluate(...)
inevaluate/evaluate.go:Evaluate
with the taskRun
method- We are ranging over
temporaryRepository.Tasks
so we need a functionTaskForIdentifier(taskIdentifer)
that, given a task identifier, return the task struct
- We are ranging over
- replace the call
- Define a new task identifier:
- Review and merge #197
- Accommodate the code repair logic to changes made in #197