Pinned Repositories
AllSpark
CVPR 2024: AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
AVSegFormer
[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer
bubogpt
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
CleyLyChen
Config files for my GitHub profile.
CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
COMBO-AVS
Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Generalizable-Audio-Visual-Segmentation
Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024
learning_to_localize_sound_source
Codebase and Dataset for the paper: Learning to Localize Sound Source in Visual Scenes
MA-LMM
MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
CleyLyChen's Repositories
CleyLyChen/AllSpark
CVPR 2024: AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
CleyLyChen/AVSegFormer
[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer
CleyLyChen/bubogpt
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
CleyLyChen/CleyLyChen
Config files for my GitHub profile.
CleyLyChen/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
CleyLyChen/COMBO-AVS
Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
CleyLyChen/Generalizable-Audio-Visual-Segmentation
Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024
CleyLyChen/learning_to_localize_sound_source
Codebase and Dataset for the paper: Learning to Localize Sound Source in Visual Scenes
CleyLyChen/MA-LMM
CleyLyChen/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
CleyLyChen/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding