yuanrr
Ph.D student, focusing on image and video understanding, i.e., visual question answering, video question answering, etc.
Pinned Repositories
LGVA_VideoQA
Language-Guided Visual Aggregation for Video Question Answering
Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
Self-PT
Self-PT: Adaptive Self-Prompt Tuning for Low-Resource Visual Question Answering (ACM MM 2023)
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
logit-standardization-KD
[CVPR 2024 Highlight] Logit Standardization in Knowledge Distillation
CoMa
ICMRSS
Knowledge-Driven Analysis and Retrieval on Multimedia.
Self-PT
Self-PT: Adaptive Self-Prompt Tuning for Low-Resource Visual Question Answering (ACM MM' 23)
SEMA
SEMA: Semantic Distance Adversarial Learning for Text-to-Image Synthesis (TMM' 23)
UCT
UCT: Unbiased Feature Learning with Causal Intervention for Visible-Infrared Person Re-identification (Under review)
yuanrr's Repositories
yuanrr/SEMA
SEMA: Semantic Distance Adversarial Learning for Text-to-Image Synthesis (TMM' 23)
yuanrr/CoMa
yuanrr/ICMRSS
Knowledge-Driven Analysis and Retrieval on Multimedia.
yuanrr/Self-PT
Self-PT: Adaptive Self-Prompt Tuning for Low-Resource Visual Question Answering (ACM MM' 23)
yuanrr/UCT
UCT: Unbiased Feature Learning with Causal Intervention for Visible-Infrared Person Re-identification (Under review)