embeddings-benchmark/mteb

Evaluating only on English

Closed this issue · 3 comments

Hello,

We are interested in evaluating MTEB only on English (our model is neither multi- nor cross-lingual). That is, for multi-lingual datasets, we would only like to select the "eng" subset, and for cross-lingual datasets, we would only like to evaluation on "en-en".

This used to be possible, in a roundabout way, like so:

task_names = [task for task in mteb.MTEB_MAIN_EN.tasks if mteb.get_task(task).metadata.type in args.task_types]

evaluation = mteb.MTEB(tasks=task_names, task_langs=["en"])
evaluation.run(embedder, eval_splits=["test"], output_folder=f"results/{name}")

Removing crosslingual and multilingual tasks doesn't work, as this removes entire tasks, not subsets of tasks.

But this gets me a bunch of DeprecationWarnings now. Is there a nice way to replicate the behavior of this piece of code? I think it would be nice to still have a good way of running just the English MTEB tasks, without having to run on a lot of languages that won't work anyway.

The intended use is:

import mteb
from mteb.benchmarks import MTEB_MAIN_EN

evaluation = mteb.MTEB(tasks=MTEB_MAIN_EN)
evaluation.run(embedder, eval_splits=["test"], output_folder=f"results/{name}")

And yes this results in some deprecation warnings as some of the datasets now have been updated (e.g. to be run faster). All the tasks are still runable. We will soon release a v2, zero-shot, and notably faster version of the English benchmark. This will use the updated tasks. However, we will maintain support for MTEB v1.

To make this slightly easier I have updated the interface in #1208, once merged is should be (though the old approach will still work).

import mteb

benchmark = mteb.get_benchmark("MTEB(eng)")
evaluation = mteb.MTEB(tasks=benchmark)
evaluation.run(embedder, eval_splits=["test"], output_folder=f"results/{name}")

Edit: this PR also ensure that it only uses the English language, which means you don't need to task_langs=["en"].

Closing this for now as the referenced PR is merged. Feel free to reopen if there is anything unclear.