HongkuanZhang's Stars
ramavedantam/cider
python codes for CIDEr - Consensus-based Image Caption Evaluation
showlab/EgoVLP
[NeurIPS2022] Egocentric Video-Language Pretraining
OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Vision-CAIR/VisualGPT
VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models
HuiGuanLab/nrccr
Source code of our MM'22 paper Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning
fancy88/iBook
收藏一些电子书
neubig/util-scripts
Various utility scripts useful for natural language processing, machine translation, etc.
woojeongjin/FewVLM
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)
v-iashin/video_features
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
NVlabs/SegFormer
Official PyTorch implementation of SegFormer
google-research/vision_transformer
Alibaba-MIIL/STAM
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)
google-research/scenic
Scenic: A Jax Library for Computer Vision Research and Beyond
ttengwang/Awesome_Prompting_Papers_in_Computer_Vision
A curated list of prompt-based paper in computer vision and vision-language learning.
KaiyangZhou/CoOp
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
JingfengYang/Multi-modal-Deep-Learning
tesseract-ocr/tesseract
Tesseract Open Source OCR Engine (main repository)
lixin4ever/Conference-Acceptance-Rate
Acceptance rates for the major AI conferences
dair-ai/ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
MILVLG/bottom-up-attention.pytorch
A PyTorch reimplementation of bottom-up-attention models
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
josephch405/curriculum-nmt
NLP2CT/norm-nmt
Norm-Based Curriculum Learning for Neural Machine Translation (ACL 2020)
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
hwanheelee1993/UMIC
An unreferenced image captioning metric (ACL-21)
ChenRocks/UNITER
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
NLP2CT/ua-cl-nmt
Uncertainty-Aware Curriculum Learning for Neural Machine Translation (ACL 2020)
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
Eurus-Holmes/Awesome-Multimodal-Research
A curated list of Multimodal Related Research.