symflower/eval-dev-quality

Keep individual coverage files and LLM query/responses

zimmski opened this issue · 3 comments

We need to keep all interactions. That includes the coverage files we are collecting.

What about #181 then? Close?

I think the cleanest solution would be to use logrus "Hooks". That way we can keep most of our logging as is, but i.e. log prompts with a special type=prompt attribute and add a hook to the logging that also writes the prompt content into a separate file.

Planning

Introduce structural logging, to have a single place where artifacts like model responses or coverage files are saved on disk. Via structural logging we can define keys/attributes like model, repository, task, etc. which than defines how we log and save artifacts.

Which logging library?

After some research there are two candidates, https://github.com/sirupsen/logrus and https://pkg.go.dev/golang.org/x/exp/slog.
Logrus has "hooks" to act on entries with specific attributes, and for slog one needs to implement a custom "Handler".
Since we have a hierarchical logging structure, the "Handler" approach is preferable, since the handler then decides when it is necessary to log into a new file and there can be a hierarchy of handlers. When using hooks, a hook would need to manage several files at once, like one for every model.

Tasks

  • Switch to new logging library slog (without changing the current logging behavior)
    • Set attributes like model, task, repository, etc accordingly
  • Write artifacts like model responses to disk.
    • LLM responses
    • Coverage files
    • ...