MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
PythonApache-2.0
Issues
- 3
process_single_sample function's question
#16 opened by bruceisme - 2
Answer not present in the model prediction
#41 opened by insafim - 1
Qwen2-VL-7B Inference Code
#42 opened by insafim - 8
MMMU-Pro 4 Choices
#40 opened by insafim - 2
Temperature setting
#38 opened by starriver030515 - 2
- 1
Enquiry about the usage about your dataset
#33 opened by tunantu - 3
Add validation set to EvalAI
#30 opened by dchichkov - 0
ls
#29 opened by yuanze-lin - 0
.tsv file
#28 opened by beichenzbc - 2
- 2
validation_Materials_25 answer seems wrong?
#27 opened by Zarjagen - 17
RuntimeError: The size of tensor a (162) must match the size of tensor b (7) at non-singleton dimension 1
#18 opened by nrikoh - 3
GPT4o
#24 opened by dirtycomputer - 3
- 1
- 1
- 2
- 3
- 2
gpt4v refuse to answer/ insist on "I'm sorry, but I'm unable to view images" these kind of things
#19 opened by SweetGUOguo - 4
PNG files does not convert to RGB
#17 opened by y-vectorfield - 1
Request for answer_dict.json for test and dev
#15 opened by boxin-wbx - 6
Image and JSON dataset.
#13 opened by sxj1215 - 2
How was "prompt engineering" performed?
#12 opened by mckinziebrandon - 7
Representing LLaVa-1.5-13b
#10 opened by teasgen - 11
Model Evaluatation
#9 opened by Rubics-Xuan - 1
Mismatch of the data label in Eval code
#11 opened by XiongweiWu - 3
- 2
model evaluation
#3 opened by mactavish91 - 2
- 2
Question about "Text as Input"
#8 opened by fxmeng - 4
Error reports when loading the dataset
#4 opened by XiongweiWu - 8
Evaluation Prompt for mPLUG-Owl2
#1 opened by vateye - 1