symflower/eval-dev-quality

Evaluation task: Transpile

Opened this issue · 2 comments

Goal

Given a source code, the model needs to transpile it from Java to Go, and from Go to Java. The response is validated by executing predefined tests making sure that the implementation is correct.

TODOs

  • Testdata
    • Create a new repository transpile for both Golang and Java
    • Testcases
      • Considerations:
        • define multiple packages inside transpile with the different testing scenarios
        • each package must contain an implementation and the corresponding test file
        • pick some examples from the light repository with different levels of complexity
      • cascadingIfElse
        • Golang
        • Java
      • sort
        • Golang
        • Java
      • binarySearch
        • Golang
        • Java
      • balancedBrackets
        • Golang
        • Java
      • pascalsTriangle
        • Golang
        • Java
  • Implementation
    • create a new task identifier transpile
    • mark the new task as unsupported for symflower model
    • mark the new task as supported for llm model
    • create a task-transpile.go to implement the task behavior
    • create a prompt for the new task
    • create a transpileSourceCodeFile in model/llm/llm.go that writes the generated code to a file

So in this case a test case contains the implementation in Language A and a test file in language B, right?

We need to ensure that the transpiled implementation in language B actually works with the tests. So we need to show the model the signature in language B that it needs to conform to. So I would also have for each example an implementation file B that already contains the implementation signature and show that in the prompt.

I think the refactoring of the prompting is a bit too ambitious. Really the only thing that changes for each task is the prompt template and the context (and we even embed parts of the context like the source file to deduplicate code).

Would be nice to have a helper function that just takes both these things and applies the context to the template. But I think maybe we can get away with having an any argument for the context... cause it is just a context, there is no common method for that. And the templating will anyways fail if it cannot find the context values it needs, so that happens already - not really necessary to add another check for that. Not sure if we need to introduce an interface at all.