leexinhao
I am a MS student at Nanjing University. My research interests mainly lie in efficient video/image understanding and generation methods.
SenseTimeNanjing
leexinhao's Stars
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
voxel51/fiftyone
Refine high-quality datasets and visual AI models
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
LLaVA-VL/LLaVA-NeXT
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
MeetKai/functionary
Chat language model that can use tools and interpret the results
rhymes-ai/Aria
Codebase for Aria - an Open Multimodal Native MoE
zhuzilin/ring-flash-attention
Ring attention implementation with flash attention
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
SHI-Labs/NATTEN
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
showlab/videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
pkunlp-icler/FastV
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
daixiangzi/Awesome-Token-Compress
A paper list of some recent works about Token Compress for Vit and VLM
EvolvingLMMs-Lab/LongVA
Long Context Transfer from Language to Vision
VectorSpaceLab/Video-XL
🔥🔥First-ever hour scale video understanding models
kongds/E5-V
E5-V: Universal Embeddings with Multimodal Large Language Models
JUNJIE99/MLVU
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
IVGSZ/Flash-VStream
This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"
MCG-NJU/AWT
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
egoschema/EgoSchema
Gumpest/SparseVLMs
Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".
QQ-MM/Video-CCAM
A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.
KangarooGroup/Kangaroo
official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
Beckschen/LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
fmthoker/SEVERE-BENCHMARK
MCG-NJU/ZeroI2V
[ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
MCG-NJU/VideoEval
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model
XingruiWang/SuperCLEVR-Physics
A video question answering dataset that focuses on the dynamics properties of objects (velocity, acceleration) and their collisions within 4D scenes.
leexinhao/VideoEval
A vision-centric evaluation method for video foundation models that is comprehensive, challenging, indicative, and low-cost.