jackchen69's Stars
SunzeY/AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Cecile-hi/Multimodal-Learning-with-Alternating-Unimodal-Adaptation
Multimodal Learning Method MLA for CVPR 2024
JiuTian-VL/JiuTian-LION
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
penghao-wu/vstar
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
neilfei/brivl-nmi
wilson1yan/VideoGPT
facebookresearch/open-eqa
OpenEQA Embodied Question Answering in the Era of Foundation Models
mlpc-ucsd/TokenCompose
(CVPR 2024) 🧩 TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Beckschen/ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
threestudio-project/threestudio
A unified framework for 3D content generation.
3DTopia/GPTEval3D
[ CVPR 2024 ] Implementation for "GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation"
vkhoi/cora_cvpr24
ailab-kyunghee/CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
weihao1115/mm-sam
The official implementation of "Segment Anything with Multiple Modalities".
adobe-research/MagicFixup
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
facebookresearch/sapiens
High-resolution models for human tasks.
allenai/OLMoE
OLMoE: Open Mixture-of-Experts Language Models
Tencent/DepthCrafter
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
run-llama/llamacloud-demo
opendilab/PsyDI
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements. (e.g. MBTI Measurement Agent)
microsoft/eureka-ml-insights
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
thomasturek/aquinas
feiyuchen7/M3NET
Pytorch implementation for the paper: Multivariate, Multi-frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation, CVPR 2023.
EMOsuperb/EMO-SUPERB-submission
EMO-SUPERB submission
praveena2j/Joint-Cross-Attention-for-Audio-Visual-Fusion
IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"
esmam-ai/MultisensoryEmotions
Bio-inspired multi sensory emotions recognition
yuntaoshou/CBERL