embeddings-benchmark/mteb

standardize descriptive stats

KennethEnevoldsen opened this issue · 2 comments

Currently, descriptive stats are quite inconsistent. This leads problems e.g. if we want to calculate the number of characters pr. task to estimate the number of compute tokens needed.

All of these calculations could be automated and are in the _calculate_metrics_from_split, however, it is not calculated for all datasets. It would be great to have a test that tests that these are calculated consistently across all tasks.

Additionally, this data is currently included in the metadata, which might not be ideal (often requiring copy-paste, which could lead to potential errors). A solution could be to write it to a json from which the data is fetched when needed. Tests can then fail if this cache is not full.

I've been considering improvements to metadata_metrics as well.

A solution could be to write it to a JSON file from which the data is fetched when needed. Tests can then fail if this cache is not complete.

That's a great suggestion! Are you suggesting storing a JSON file with all the metadata directly in the mteb repository?

Yep - packaged into the package.