bigscience-workshop/t-zero

big-bench evaluation

thesofakillers opened this issue · 1 comments

Hi, thank you for your work.

From what I understand a large portion of the evaluation was done on the big-bench benchmark.
How would we run evaluation to reproduce these results? It is unclear from the evaluation README.

Thank you!

Hi @thesofakillers, thanks for your question

We used a fork from the BIGBENCH repository: https://github.com/lintangsutawika/BIG-bench/tree/t5 (that is now very much behind...hehe)

Now that big bench is also available through the HF datasets library (https://huggingface.co/datasets/bigbench), I suspect you can tweak around the https://github.com/bigscience-workshop/t-zero/blob/master/evaluation/run_eval.py script to get the bigbench numbers.