leexinhao

I am a MS student at Nanjing University. My research interests mainly lie in efficient video/image understanding and generation methods.

SenseTimeNanjing

leexinhao's Stars

OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python13.1k 107 617914
voxel51/fiftyone
Refine high-quality datasets and visual AI models
Language:Python9.3k 65 1.6k608
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Language:Python4.3k 32 510270
LLaVA-VL/LLaVA-NeXT
Language:Python3.6k 35 387334
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Language:Python2.3k 8 286231
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
2.1k 55 797
MeetKai/functionary
Chat language model that can use tools and interpret the results
Language:Python1.5k 21 131114
rhymes-ai/Aria
Codebase for Aria - an Open Multimodal Native MoE
Language:Jupyter Notebook1k 20 5286
zhuzilin/ring-flash-attention
Ring attention implementation with flash attention
Language:Python717 13 4360
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Language:Python604 12 8542
SHI-Labs/NATTEN
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
Language:Cuda430 11 13536
showlab/videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Language:Python416 8 5241
pkunlp-icler/FastV
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Language:Python396 3 4215
daixiangzi/Awesome-Token-Compress
A paper list of some recent works about Token Compress for Vit and VLM
387 13 820
EvolvingLMMs-Lab/LongVA
Long Context Transfer from Language to Vision
Language:Python368 7 3719
VectorSpaceLab/Video-XL
🔥🔥First-ever hour scale video understanding models
Language:Python258 5 3018
kongds/E5-V
E5-V: Universal Embeddings with Multimodal Large Language Models
Language:Python237 3 228
JUNJIE99/MLVU
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
Language:Python187 4 110
IVGSZ/Flash-VStream
This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"
Language:Python173 3 3014
MCG-NJU/AWT
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Language:Python95 4 14
egoschema/EgoSchema
Language:Python86 1 232
Gumpest/SparseVLMs
Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".
Language:Python82 2 167
QQ-MM/Video-CCAM
A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.
Language:Python68 4 113
KangarooGroup/Kangaroo
official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
Language:Python63 3 80
Beckschen/LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
Language:Python52 3 54
fmthoker/SEVERE-BENCHMARK
Language:Python26 2 22
MCG-NJU/ZeroI2V
[ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Language:Python20 1 01
MCG-NJU/VideoEval
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model
Language:Python10 1 00
XingruiWang/SuperCLEVR-Physics
A video question answering dataset that focuses on the dynamics properties of objects (velocity, acceleration) and their collisions within 4D scenes.
Language:Python8 1 00
leexinhao/VideoEval
A vision-centric evaluation method for video foundation models that is comprehensive, challenging, indicative, and low-cost.
Language:Python31

leexinhao

leexinhao's Stars

OpenBMB/MiniCPM-V

voxel51/fiftyone

QwenLM/Qwen2-VL

LLaVA-VL/LLaVA-NeXT

EvolvingLMMs-Lab/lmms-eval

yunlong10/Awesome-LLMs-for-Video-Understanding

MeetKai/functionary

rhymes-ai/Aria

zhuzilin/ring-flash-attention

rese1f/MovieChat

SHI-Labs/NATTEN

showlab/videollm-online

pkunlp-icler/FastV

daixiangzi/Awesome-Token-Compress

EvolvingLMMs-Lab/LongVA

VectorSpaceLab/Video-XL

kongds/E5-V

JUNJIE99/MLVU

IVGSZ/Flash-VStream

MCG-NJU/AWT

egoschema/EgoSchema

Gumpest/SparseVLMs

QQ-MM/Video-CCAM

KangarooGroup/Kangaroo

Beckschen/LLaVolta

fmthoker/SEVERE-BENCHMARK

MCG-NJU/ZeroI2V

MCG-NJU/VideoEval

XingruiWang/SuperCLEVR-Physics

leexinhao/VideoEval