neulab/gemini-benchmark

Re-run Mistral Evals w/ official Mistral-Instruct

Closed this issue · 0 comments

We used a third-party model instead of the official Mixtral model in our original evaluation: https://twitter.com/arthurmensch/status/1737138144854606314

We should:

  1. Re-run the Mixtral evals for tasks with the official Mixtral Instruct model
  2. Update the numbers, figures, and discussion in each task section of the paper
  3. Update the Zeno report to match the paper content

Here is a checklist for the tasks, check off each task when this is done please!

  • Knowledge-based QA
  • Reasoning
  • Mathematics
  • Code Generation
  • Translation
  • Web Instruction Following