jinliwei1997's Stars
MCG-NJU/MMN
[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding
youweiliang/evit
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations
MCG-NJU/SSD-LT
[ICCV 2021] Self Supervision to Distillation for Long-Tailed Visual Recognition
26hzhang/ReLoCLNet
Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)
MCG-NJU/MGSampler
[ICCV 2021] MGSampler: An Explainable Sampling Strategy for Video Action Recognition
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
Alvin-Zeng/DRN
Dense Regression Network for Video Grounding (CVPR2020)
JonghwanMun/LGI4temporalgrounding
Repository for the CVPR-20 paper "Local-Global Video-Text Interactions for Temporal Grounding"
JihyongOh/XVFI
[ICCV 2021, Oral 3%] Official repository of XVFI
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
hzwer/ECCV2022-RIFE
ECCV2022 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation
zdyshine/Video-Frame-Interpolation-Summary
Video Frame Interpolation Summary and Infer
MCG-NJU/CMPT
[IJCV 2021] Cross-Modal Pyramid Translation for RGB-D Scene Recognition
MCG-NJU/MultiSports
[ICCV 2021] MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
52CV/CVPR-2021-Papers
huggingface/pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
google-research-datasets/wit
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
yuewang-cuhk/awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
antoine77340/MIL-NCE_HowTo100M
PyTorch GPU distributed training code for MIL-NCE HowTo100M
duoergun0729/nlp
兜哥出品 <一本开源的NLP入门书籍>
hankcs/pyhanlp
中文分词
danieljf24/hybrid_space
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.
jeonsworld/ViT-pytorch
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
hankcs/HanLP
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
MCG-NJU/TDN
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition
MCG-NJU/CPD-Video
Learning Spatiotemporal Features via Video and Text Pair Discrimination
open-mmlab/mmselfsup
OpenMMLab Self-Supervised Learning Toolbox and Benchmark
MDSKUL/MasterProject
Code voor mijn Master project omtrent VideoBERT
linjieli222/HERO
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
AmazingUU/Douyin_spider
抖音爬虫