jasongief's Stars
babysor/MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
jason718/awesome-self-supervised-learning
A curated list of awesome self-supervised methods
sail-sg/EditAnything
Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)
salesforce/ALBEF
Code for ALBEF: a new vision-language pre-training method
AlexHex7/Non-local_pytorch
Implementation of Non-local Block.
EdisonLeeeee/Awesome-Masked-Autoencoders
A collection of literature after or concurrent with Masked Autoencoder (MAE) (Kaiming He el al.).
zhenyuw16/UniDetector
Code release for our CVPR 2023 paper "Detecting Everything in the Open World: Towards Universal Object Detection".
linzhiqiu/cross_modal_adaptation
Cross-modal few-shot adaptation with CLIP
GeWu-Lab/OGM-GE_CVPR2022
The repo for "Balanced Multimodal Learning via On-the-fly Gradient Modulation", CVPR 2022 (ORAL)
OpenNLPLab/TransnormerLLM
Official implementation of TransNormerLLM: A Faster and Better LLM
ziqipang/LM4VisualEncoding
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
OpenNLPLab/FAVDBench
[CVPR 2023] Official implementation of the paper: Fine-grained Audible Video Description
OpenNLPLab/Tnn
[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling
MengyuanChen21/Awesome-Evidential-Deep-Learning
A curated publication list on evidential deep learning.
haoyi-duan/DG-SCT
NeurIPS'2023 official implementation code
OpenNLPLab/Transnormer
[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer
MengyuanChen21/CVPR2023-CMPAE
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
MengyuanChen21/NeurIPS2024-CSP
[NeurIPS 2024] Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models
jasongief/CPSP
[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line
OpenNLPLab/FNAC_AVL
[CVPR 2023] Official implementation of our paper - Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
Georgelingzj/up-to-date-Vision-Language-Models
Up-to-date Vision Language Models collection. Mainly focus on computer vision
VUT-HFUT/Micro-Action
[TCSVT 2024] Official implementation of the paper: Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
GeWu-Lab/TSPM
Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.
GeWu-Lab/LFAV
Towards Long Form Audio-visual Video Understanding
VUT-HFUT/MiGA2023_Track1
[IJCAI 2023]The Champion of Micro-gesture Classification sub-challenge in MiGA@IJCAI2023.
zhangbin-ai/APL
APL for AVQA task
jinxiang-liu/SSL-TIE
Official code for ACMMM2022 paper, "Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation"