Cece1031's Stars
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
xudejing/video-question-answering
Video Question Answering via Gradually Refined Attention over Appearance and Motion
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
schowdhury671/meerkat
DAMO-NLP-SG/VCD
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
rikeilong/Bay-CAT
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
GeWu-Lab/MUSIC-AVQA
MUSIC-AVQA, CVPR2022 (ORAL)
Ziyang412/VideoTree
Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
gyxxyg/VTG-LLM
[Preprint] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
MengyuanChen21/CVPR2023-CMPAE
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
ExplainableML/AVCA-GZSL
This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language"
hlchen23/ADPN-MM
Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding"
lucidrains/mirasol-pytorch
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
kyegomez/Mirasol
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
sauradip/MUPPET
[ Arxiv 2023 ] This repository contains the code for "MUPPET: Multi-Modal Few-Shot Temporal Action Detection"
ttgeng233/UnAV
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
OFA-Sys/ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
sauradip/STALE
[ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot Temporal Action Detection via Vision-Language Prompting "
jianzongwu/Awesome-Open-Vocabulary
(TPAMI 2024) A Survey on Open Vocabulary Learning
pengsida/learning_research
本人的科研经验
Cadene/pretrained-models.pytorch
Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.
52CV/CVPR-2023-Papers
52CV/CV-Surveys
计算机视觉相关综述。包括目标检测、跟踪........
AccumulateMore/OpenCV
✔(已完结)最全面的 OpenCV 笔记【咕泡唐宇迪】
statusrank/XCurve
XCurve is an end-to-end PyTorch library for X-Curve metrics optimizations in machine learning.
PKUFlyingPig/cs-self-learning
计算机自学指南