jackchen69

jackchen69's Stars

SunzeY/AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Language:Jupyter Notebook65839
Cecile-hi/Multimodal-Learning-with-Alternating-Unimodal-Adaptation
Multimodal Learning Method MLA for CVPR 2024
Language:Python393
JiuTian-VL/JiuTian-LION
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Language:Jupyter Notebook1174
PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Language:Python76941
penghao-wu/vstar
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
Language:Python50933
neilfei/brivl-nmi
Language:Python6010
wilson1yan/VideoGPT
Language:Jupyter Notebook965119
facebookresearch/open-eqa
OpenEQA Embodied Question Answering in the Era of Foundation Models
Language:Jupyter Notebook20920
mlpc-ucsd/TokenCompose
(CVPR 2024) 🧩 TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Language:Jupyter Notebook1093
Beckschen/ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
Language:Python1635
threestudio-project/threestudio
A unified framework for 3D content generation.
Language:Python6.2k474
3DTopia/GPTEval3D
[ CVPR 2024 ] Implementation for "GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation"
Language:Python2215
vkhoi/cora_cvpr24
Language:Python14
ailab-kyunghee/CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Language:Python402
weihao1115/mm-sam
The official implementation of "Segment Anything with Multiple Modalities".
Language:Python591
adobe-research/MagicFixup
Language:Python1318
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Language:Python7.9k737
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
Language:Python1.1k158
facebookresearch/sapiens
High-resolution models for human tasks.
Language:Python4.1k218
allenai/OLMoE
OLMoE: Open Mixture-of-Experts Language Models
Language:Jupyter Notebook40530
Tencent/DepthCrafter
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Language:Python69026
run-llama/llamacloud-demo
5921
opendilab/PsyDI
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements. (e.g. MBTI Measurement Agent)
Language:TypeScript13311
microsoft/eureka-ml-insights
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
Language:Python649
thomasturek/aquinas
Language:Python145
feiyuchen7/M3NET
Pytorch implementation for the paper: Multivariate, Multi-frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation, CVPR 2023.
Language:Python295
EMOsuperb/EMO-SUPERB-submission
EMO-SUPERB submission
Language:Python262
praveena2j/Joint-Cross-Attention-for-Audio-Visual-Fusion
IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"
Language:Python316
esmam-ai/MultisensoryEmotions
Bio-inspired multi sensory emotions recognition
Language:Python31
yuntaoshou/CBERL
Language:Python4