Pinned Repositories
CoTNet
This is an official implementation for "Contextual Transformer Networks for Visual Recognition".
CoTNet-ObjectDetection-InstanceSegmentation
image-captioning
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
efficientvit
Efficient vision foundation models for high-resolution generation and perception.
ConsisID
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
BTO-Net
ImageNetModel
Official ImageNet Model repository
OpenWorldVision
TDEN
xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
YehLi's Repositories
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
YehLi/ImageNetModel
Official ImageNet Model repository
YehLi/TDEN
YehLi/BTO-Net
YehLi/OpenWorldVision