Pinned Repositories
rift
RIFT 见缝插帧
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
ctr-din-pytorch
The Most Complete PyTorch Implementation of "Deep Interest Network for Click-Through Rate Prediction"
HumanMotionQA
Motion Question Answering via Modular Motion Programs
MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
ORViT
"Object-Region Video Transformers”, Herzig et al., CVPR 2022
refcoco_data_tool
tarsier
VideoX
VideoX: a collection of video cross-modal models
VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
yeyingdege's Repositories
yeyingdege/ctr-din-pytorch
The Most Complete PyTorch Implementation of "Deep Interest Network for Click-Through Rate Prediction"
yeyingdege/refcoco_data_tool
yeyingdege/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
yeyingdege/HumanMotionQA
Motion Question Answering via Modular Motion Programs
yeyingdege/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
yeyingdege/ORViT
"Object-Region Video Transformers”, Herzig et al., CVPR 2022
yeyingdege/tarsier
yeyingdege/VideoX
VideoX: a collection of video cross-modal models
yeyingdege/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)