tenaflyyy's Stars
OpenMICG/SwiftCraft3D
Efficient Text-to-3D Generation via Semantic-enhanced Sparse-view Prompting with Hybrid Reconstruction
yformer/EfficientSAM
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
OpenMICG/CoCoMeD
Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering
OpenMICG/AHP
Adapter-Enhanced Hierarchical Cross-Modal Pre-training for Lightweight Medical Report Generation
OpenMICG/MossVLN
Observation Driven Memory Synergistic Planning for Continuous Vision-Language Navigation
OpenMICG/CSLAKE
A consistent Med-VQA dataset, C-SLAKE , extended by Slake for further consistency assessment .
tenaflyyy/CoCoMeD
Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering
OpenMICG/mcg
Multigranularity Contrastive cross-modal collaborative Generation (MCG) model for Video QA
jayleicn/TVQAplus
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
jayleicn/TVQA
[EMNLP 2018] PyTorch code for TVQA: Localized, Compositional Video Question Answering
dingmyu/VRDP
[NeurIPS 2021] Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
chuangg/CLEVRER
PyTorch implementation of ICLR 2020 paper "CLEVRER: CoLlision Events for Video REpresentation and Reasoning"
VRU-NExT/VideoQA
chenfei-wu/TaskMatrix
MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Victorwz/VaLM
VaLM: Visually-augmented Language Modeling. ICLR 2023.
salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
CCIIPLab/DPT
The code of IJCAI2022 paper, Declaration-based Prompt Tuning for Visual Question Answering
CurryYuan/X-Trans2Cap
[CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
yuewang-cuhk/awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
yz93/LAVT-RIS
minghangz/cpl
CPL: Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning
floodsung/Deep-Learning-Papers-Reading-Roadmap
Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!
facebookresearch/TimeSformer
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
antoyang/just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
facebookresearch/detr
End-to-End Object Detection with Transformers
mttr2021/MTTR
tenaflyyy/ClipBERT
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
tenaflyyy/hcrn-videoqa
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
thaolmk54/hcrn-videoqa
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)