embeddings-benchmark/mteb

Add older MMTEB baselines

orionw opened this issue · 1 comments

I was aggregating models for a different project and realized there are a couple of older multilingual baselines we should try. facebook/mcontriever-msmarco (Contriever multilingual) and castorini/mdpr-tied-pft-msmarco (DPR-based with tied encoders). I don't think they're going to be very strong, but they're potentially worth having since they're baselines people may have heard of. Both can be run with sentence-transformers out of the box.

cc @KennethEnevoldsen and @Muennighoff

Great ran nthakur/mcontriever-base-msmarco (facebook one did not work with ST for me) & castorini/mdpr-tied-pft-msmarco - Results are here: embeddings-benchmark/results#40

Are we good on bge-m3 & gte-multilingual-base or are results still missing? I think we should also have them

https://huggingface.co/BAAI/bge-multilingual-gemma2 also seems worth running but not in MTEB yet I think 🤔