neulab/gemini-benchmark

Re-run with Gemini's safety settings off

Closed this issue · 0 comments

Gemini has safety filters on by default, but this may hurt downstream accuracy. We should try to re-run the gemini filters with safety settings off (reference: BerriAI/litellm#1190).

We should:

  1. Re-run the Gemini evals for tasks with safety settings off (we will name this model gemini-pro and the model with safety settings on gemini-pro-filtered)
  2. Update the numbers, figures, and discussion in each task section of the paper with the new gemini-pro numbers
  3. Where appropriate (maybe in MMLU and Translation?) have a limited discussion of the effect of safety filtering
  4. Update the Zeno report to match the paper content

Here is a checklist for the tasks, check off each task when this is done please!

  • Knowledge-based QA
  • Reasoning
  • Mathematics
  • Code Generation
  • Translation
  • Web Instruction Following