shallowdream66's Stars
xuanso/uav-206
baaivision/Emu3
Next-Token Prediction is All You Need
zipper112/CDeFuse
GXYM/STGT
Video-Language Alignment via Spatio–Temporal Graph Transformer; ArXiv: https://arxiv.org/abs/2407.11677
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
uark-cviu/Micron-BERT
[CVPR 2023] Micron-BERT: BERT-based Facial Micro-Expression Recognition
ssyze/EVE
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
radarFudan/Awesome-state-space-models
Collection of papers on state-space models
amusi/CVPR2024-Papers-with-Code
CVPR 2024 论文和开源项目合集
xai-org/grok-1
Grok open release
jpthu17/HBI
[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
whwu95/Cap4Video
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
CompVis/latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models