YehLi

Sun Yat-sen University

Pinned Repositories

CoTNet
This is an official implementation for "Contextual Transformer Networks for Visual Recognition".
Language:Python527 10 3381
CoTNet-ObjectDetection-InstanceSegmentation
Language:Python33 2 58
image-captioning
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
Language:Python273 4 3455
efficientvit
Efficient vision foundation models for high-resolution generation and perception.
Language:Python2.6k 40 158213
ConsisID
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Language:Python583 11 3731
BTO-Net
Language:Python3 1 10
ImageNetModel
Official ImageNet Model repository
Language:Jupyter Notebook244 5 1338
OpenWorldVision
Language:Python3 2 10
TDEN
Language:Python9 0 01
xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language:Python968 28 63105

YehLi's Repositories

YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language:Python968 28 63105
YehLi/ImageNetModel
Official ImageNet Model repository
Language:Jupyter Notebook244 5 1338
YehLi/TDEN
Language:Python9 0 01
YehLi/BTO-Net
Language:Python3 1 10
YehLi/OpenWorldVision
Language:Python3 2 10