yiheng003

yiheng003's Stars

BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
14.3k 270 135923
rom1504/clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
Language:Jupyter Notebook2.5k 25 234221
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Language:Python2.2k 7 274221
Xnhyacinth/Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
1.3k 54 1045
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Language:Python1.1k 10 14374
bojone/rerope
Rectified Rotary Position Embeddings
Language:Python359 11 2130
AutoSurveys/AutoSurvey
Language:Python351 4 2831
PrincetonUniversity/multi_gpu_training
Language:Python307 2 347
DAMO-NLP-SG/VCD
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Language:Python252 6 3014
oddmario/NVIDIA-Ubuntu-Driver-Guide
A little guide to help you install & manage NVIDIA GPU driver on your Ubuntu system
247 6 711
microsoft/LongRoPE
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
Language:Python204 3 1215
facebookresearch/unibench
Python Library to evaluate VLM models' robustness across diverse benchmarks
Language:Jupyter Notebook194 8 914
vitali-fedulov/images4
Image similarity in Golang. Version 4 (LATEST)
Language:Go95 1 310
zhangbaijin/From-Redundancy-to-Relevance
[NAACL 2025 Oral] From redundancy to relevance: Enhancing explainability in multimodal large language models
Language:Python87 2 106
Baiqi-Li/NaturalBench
🚀 [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with simple questions about natural imagery.
Language:Python71 10 09
xing0047/cca-llava
[NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention
Language:Python49 2 61
joez17/VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
Language:Python45 1 50
DAMO-NLP-SG/CMM
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Language:Python43 4 12
niejiahao1998/MMRel
27 1 22
visual-haystacks/vhs_benchmark
🔥 [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
Language:Python25 1 21
Houtx/AI-
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
1 0 00