R00Kie-Liu's Stars
tychen-SJTU/MECD-Benchmark
[NeurIPS'24 spotlight] MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
hotelll/Collaborative_Procedure_Alignment
Implementation of our journal paper "Achieving Procedure-Aware Instructional Video Correlation Learning under Weak Supervision from a Collaborative Perspective"
haowuxc/DIBS
[CVPR 2024] DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Jiaxuan-Li/EVCap
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
DirtyHarryLYL/LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
2noise/ChatTTS
A generative speech model for daily dialogue.
wdndev/llm_interview_note
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
TencentARC/ST-LLM
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
bpiyush/TestOfTime
Official code for our CVPR 2023 paper: Test of Time: Instilling Video-Language Models with a Sense of Time
llyx97/TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
YuxiXie/ECHo
This repository contains data and code for the paper ECHo: Event Causality Inference via Human-centric Reasoning.
fudan-zvg/Reason2Drive
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
state-spaces/mamba
Mamba SSM architecture
MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
R00Kie-Liu/Sampler
Task-adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition
AtomScott/soccer_narrator
narration for soccer
google-research/football
Check out the new game server:
zyayoung/Awesome-Video-LLMs
Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.
vaishnaviHimakunthala/VIP
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
WisconsinAIVision/ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
ggoonnzzaallo/llm_experiments
I play with my best friend GPT
mbzuai-oryx/Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
doc-doc/NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
XgDuan/WSDEC
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.
Letian2003/C-VQA
Counterfactual Reasoning VQA Dataset
BirdFly16/TO-MAR
ucas-vg/P2BNet
ECCV2022, Point-to-Box Network for Accurate Object Detection via Single Point Supervision