Table 6: Performance of Alpha-CLIP in region level captioning

Question

Table 6: Performance of Alpha-CLIP in region level captioning

Opened this issue 7 months ago · 1 comments

Great work!
I am confused with Tab .6 result, the performance is Alpha-CLIP with LLaVA-1.5 or fine-tune this model with vicuna-7b on these datasets(RefCOCOg or VG)？

Answer 1 · 2024-03-04T04:09:37.000Z

Hi, This have been discussed in #24.