Only skip benchmarking if split results are the same too
Muennighoff opened this issue · 1 comments
Muennighoff commented
We currently skip evaluation if the results file already exists, however, it may be that the file exists but not with the same splits/subsets as requested by the current evaluation. I think we may want to also check equivalence across splits & subsets and then run the ones still missing and edit the existing results file to put them back in.