ziqipang/LM4VisualEncoding
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
PythonMIT
Issues
- 1
Influence of ViT
#11 opened by jiazhen-code - 2
curious about which llama checkpoint
#10 opened by 944104439 - 3
2D VQA and Image-Text Retrieval
#6 opened by xvolica - 4
LLMBoostMedical
#9 opened by daydayupzzl - 0
About Motion Forecasting
#8 opened by Zbozhou - 0
Sharing experiments in lung sound abnormal detection, and suggestion, add random initialization weights of LLM layer experiments.
#7 opened by QiaoranC - 1
Some questions about ViT-Small-LLaMA
#4 opened by 1090h2400 - 2
- 1
Is there any ablation studies on the number of LLM layers inserted between the visual encoder and classifiers?
#2 opened by valencebond