Table 6: Performance of Alpha-CLIP in region level captioning
Opened this issue · 1 comments
jetyingjia commented
Great work!
I am confused with Tab .6 result, the performance is Alpha-CLIP with LLaVA-1.5 or fine-tune this model with vicuna-7b on these datasets(RefCOCOg or VG)?