researchmm/soho

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Python

Issues

cannot reproduce the performance of visual Entailment dataset.
#3 opened 3 years ago by youngfly11
4
The Accuracy of Masked Visual Modeling
#10 opened 3 years ago by mhyeh
1
你们这是开源了个寂寞啊。。
#13 opened 3 years ago by Sry2016
0
pretrain model of soho based on resenet101
#12 opened 3 years ago by maogewudi007
0
It is abnormal , so many unexpected keys???
#11 opened 3 years ago by alice-cool
0
the download link may be useless, can you update these? Thank you, sir.
#9 opened 3 years ago by syiswell
0
new tool
#8 opened 3 years ago by mocki-zz
0
how to evaluate image/text retrieval on soho?
#7 opened 3 years ago by byougert
0
Presentation slide
#4 opened 3 years ago by pqviet
0
No module named 'SOHO.version'
#6 opened 3 years ago by grandsmile
2
pretrained models can not be downloaded
#5 opened 3 years ago by kaizhigaosu
1
Do you plan to release the training configurations and scripts of the pre-training?
#2 opened 4 years ago by Jxu-Thu
0
Where I can find the VD?
#1 opened 4 years ago by LIUYUANWEI98
1