yiheng003's Stars
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
rom1504/clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Xnhyacinth/Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
bojone/rerope
Rectified Rotary Position Embeddings
AutoSurveys/AutoSurvey
PrincetonUniversity/multi_gpu_training
DAMO-NLP-SG/VCD
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
oddmario/NVIDIA-Ubuntu-Driver-Guide
A little guide to help you install & manage NVIDIA GPU driver on your Ubuntu system
microsoft/LongRoPE
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
facebookresearch/unibench
Python Library to evaluate VLM models' robustness across diverse benchmarks
vitali-fedulov/images4
Image similarity in Golang. Version 4 (LATEST)
zhangbaijin/From-Redundancy-to-Relevance
[NAACL 2025 Oral] From redundancy to relevance: Enhancing explainability in multimodal large language models
Baiqi-Li/NaturalBench
🚀 [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with simple questions about natural imagery.
xing0047/cca-llava
[NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention
joez17/VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
DAMO-NLP-SG/CMM
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
niejiahao1998/MMRel
visual-haystacks/vhs_benchmark
🔥 [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
Houtx/AI-
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.