Pinned Repositories
TinyGPT-V
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
ClipCropping
dylanqyuan
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
MiniGPT-5
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Q-Instruct
②[CVPR 2024] Low-level visual instruction tuning, with a 200K dataset and a model zoo for fine-tuned checkpoints.
LLaVAR
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
AesBench
An expert benchmark aiming to comprehensively evaluate the aesthetic perception capacities of MLLMs.
AesExpert
[ACMMM 2024] AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
dylanqyuan's Repositories
dylanqyuan/ClipCropping
dylanqyuan/dylanqyuan
dylanqyuan/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks