mertyg/vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
PythonMIT
Issues
- 0
- 0
- 1
Evaluation bug when using GELU vs QuickGELU -- changes the results for some benchmarks
#35 opened by bryant1410 - 4
- 16
about the performance of originial CLIP
#32 opened by hiker-lw - 1
Question regarding numbers in Figure 1
#36 opened by YunYunY - 1
Questions on evaluation results
#33 opened by ytaek-oh - 1
- 20
Exact hyperparameters for NegCLIP training. and question about imagenet accuracy reported in the paper
#4 opened by HarmanDotpy - 1
- 2
I cannot run on RTX 3060 with batch-size=256!
#30 opened by shuguang99 - 10
train negCLIP result problem
#27 opened by haoshuai714 - 2
I can't reproduce Table 6
#29 opened by shuguang99 - 2
parameter file problem
#28 opened by haoshuai714 - 0
- 4
- 0
Model weights of regular COCO finetuning.
#25 opened by wildphoton - 8
Flava image preprocessing
#24 opened by DianeBouchacourt - 1
slow evaluation for xvlm
#23 opened by lezhang7 - 7
Projections W_i and W_t
#22 opened by DianeBouchacourt - 2
Where to find the training data of NegCLIP?
#21 opened by Wyattwwwww - 1
Calling model.eval() when computing scores otherwise non-deterministic results (torch._no_grad_() is not enough)
#17 opened by DianeBouchacourt - 8
- 5
- 4
Requirements (e.g. torch versions)
#16 opened by DianeBouchacourt - 1
Models are not in eval() mode.
#7 opened by linzhiqiu - 3
Questions on BLIP score computation
#15 opened by DianeBouchacourt - 25
mismatching results on compositional task
#9 opened by lezhang7 - 2
- 5
- 3
eval coco order and flickr order
#11 opened by lezhang7 - 4
question about VG-Relation categories
#8 opened by hiker-lw - 6
- 2
- 1
why concat df to all_df?
#5 opened by lezhang7 - 5
- 5
When can you provide code and dataset?
#2 opened by BigHyf