Massive Multimodal Extension of MTEB
Opened this issue · 1 comments
Muennighoff commented
This issue is to discuss tasks to add for a massive multimodal extension of MTEB. The modalities are:
- T=Text
- I=Image
- A=Audio
- V=Video without audio i.e. just multiple images
Please add any tasks to this sheet. There are a total of 100 input/output modality combinations. Besides modality combinations, we'd also like to maximize task type combinations (Retrieval, Classification etc). Please feel free to edit the sheet to add more tasks (or just comment them and someone will edit it). Also find a simple schematic below.
SaitejaUtpala commented
@Muennighoff https://x.com/WenhuChen/status/1844577017930694984
Massive Mulimodal Embedding Benchmark (MMEB).