yuecao0119/MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". MMFuser addresses the limitations of current MLLMs in capturing complex image details by simply yet efficiently integrating multi-layer features from ViTs.
PythonApache-2.0
Stargazers
- Abby-mXidian University
- ChaofanChen-fanrenBeijing
- Char1sk
- CSer-Tang-haoThe Hong Kong Polytechnic University
- cydiachen
- czczupNanjing University
- daniel620
- dblaskoINSA Lyon / TU Wien
- duyao-art
- Echo0125KwaiVGI
- eharecz
- elejke
- EthanJi29Nanjing University
- EveAny
- imr555Neovotech
- InternVL
- Ivesfu
- JP-Morgan
- Jty-123UCAS
- lll2343
- lofriengerThe Chinese University of Hong Kong
- lyccnb
- ngthanhtinN.G.U
- SSshuishuiBeihang University
- SunnyHazeMBZUAI (Mohamed Bin Zayed University of AI)
- taesiriPlanet Mars
- TGLTommy
- wade0604Bytedance
- whuhxb
- Xuchen-LiBeijing Zhongguancun Academy & Institute of Automation, Chinese Academy of Sciences
- xuyang-liu16Sichuan University
- yahooo-m
- yuecao0119Nanjing University
- Yuhan2001Nanjing University
- zehuichen123USTC
- zhangchbinThe University of Hong Kong