jinliwei1997

jinliwei1997's Stars

MCG-NJU/MMN
[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding
Language:Python888
youweiliang/evit
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations
Language:Python16819
MCG-NJU/SSD-LT
[ICCV 2021] Self Supervision to Distillation for Long-Tailed Visual Recognition
Language:Python222
26hzhang/ReLoCLNet
Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)
Language:Python517
MCG-NJU/MGSampler
[ICCV 2021] MGSampler: An Explainable Sampling Strategy for Video Action Recognition
Language:Python477
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
5.9k844
Alvin-Zeng/DRN
Dense Regression Network for Video Grounding (CVPR2020)
Language:Python5011
JonghwanMun/LGI4temporalgrounding
Repository for the CVPR-20 paper "Local-Global Video-Text Interactions for Temporal Grounding"
Language:Python12917
JihyongOh/XVFI
[ICCV 2021, Oral 3%] Official repository of XVFI
Language:Python28039
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language:Python1k111
hzwer/ECCV2022-RIFE
ECCV2022 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation
Language:Python4.4k433
zdyshine/Video-Frame-Interpolation-Summary
Video Frame Interpolation Summary and Infer
Language:Python10914
MCG-NJU/CMPT
[IJCV 2021] Cross-Modal Pyramid Translation for RGB-D Scene Recognition
Language:Python81
MCG-NJU/MultiSports
[ICCV 2021] MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
Language:Python1067
52CV/CVPR-2021-Papers
2.5k312
huggingface/pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
Language:Python31.5k4.7k
google-research-datasets/wit
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
99340
yuewang-cuhk/awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
1.1k102
antoine77340/MIL-NCE_HowTo100M
PyTorch GPU distributed training code for MIL-NCE HowTo100M
Language:Python21331
duoergun0729/nlp
兜哥出品 <一本开源的NLP入门书籍>
Language:Python2.3k558
hankcs/pyhanlp
中文分词
Language:Python3.1k806
danieljf24/hybrid_space
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.
Language:Python8717
jeonsworld/ViT-pytorch
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
Language:Jupyter Notebook1.9k363
hankcs/HanLP
中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理
Language:Python33.5k10k
MCG-NJU/TDN
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition
Language:Python36855
MCG-NJU/CPD-Video
Learning Spatiotemporal Features via Video and Text Pair Discrimination
Language:Python5913
open-mmlab/mmselfsup
OpenMMLab Self-Supervised Learning Toolbox and Benchmark
Language:Python3.2k428
MDSKUL/MasterProject
Code voor mijn Master project omtrent VideoBERT
Language:Python377
linjieli222/HERO
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
Language:Python22834
AmazingUU/Douyin_spider
抖音爬虫
Language:Python411154