embeddings-benchmark/mteb

Massive Multimodal Extension of MTEB

Opened this issue · 1 comments

This issue is to discuss tasks to add for a massive multimodal extension of MTEB. The modalities are:

  • T=Text
  • I=Image
  • A=Audio
  • V=Video without audio i.e. just multiple images

Please add any tasks to this sheet. There are a total of 100 input/output modality combinations. Besides modality combinations, we'd also like to maximize task type combinations (Retrieval, Classification etc). Please feel free to edit the sheet to add more tasks (or just comment them and someone will edit it). Also find a simple schematic below.

Screenshot 2024-09-27 at 10 09 47 AM

@Muennighoff https://x.com/WenhuChen/status/1844577017930694984

Massive Mulimodal Embedding Benchmark (MMEB).