symflower/eval-dev-quality

Use a JSON configuration file to set up an evaluation run

ruiAzevedo19 opened this issue · 1 comments

Goal: Allow to export and update a JSON configuration file to run an evaluation. We want to automatically see which models/repositories/tasks are new/gone. This allows us to commit the configuration we are using for a full evaluation for a eval version into the repository.

We want to store:

  • available models for providers
  • selected models for providers
  • available repositories (with their tasks)
  • selected repositories (with their tasks)

We want to load (?):

  • selected models for providers
  • selected repositories

TODO

Iteration 1

  • Return all available models by querying the provider's APIs
    • Provider.Models()
    • Providers
      • OpenRouter: https://openrouter.ai/api/v1/models already implemented
      • Ollama: http://127.0.0.1:11434/api/tags already implemented
    • Locally available Ollama models don't say anything about which models are generally available, only which ones are locally available, so ignore Ollama for now (until #283 is in).

Iteration 2

  • Store the available models in JSON file
  • Store the selected models in JSON file

Iteration 3

  • Store the available repositories (with tasks) in JSON file
  • Store the selected repositories (with tasks) in JSON file

Iteration 4

  • Handle JSON file as configuration argument to the evaluation
    • load selected models
    • load selected repositories

Iteration 5

  • Also store and load custom provider urls so that they don't need to be carried over manually Follow-up: #307

@ahumenberger plz check if this makes sense to u 🙏