yangxiao2's Stars
UCSC-VLAA/CLIPS
An Enhanced CLIP Framework for Learning with Synthetic Captions
microsoft/LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
google-research-datasets/conceptual-captions
Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.
Alibaba-NLP/OmniSearch
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
StevenGrove/LearnableTreeFilterV2
Hazqeel09/ellzaf_ml
Bridging Research and Practice with PyTorch
QinYang79/Awesome-Noisy-Correspondence
This is a summary of research on noisy correspondence. There may be omissions. If anything is missing please get in touch with us. Our emails: linyijie.gm@gmail.com yangmouxing@gmail.com qinyang.gm@gmail.com
amusi/ECCV2024-Papers-with-Code
ECCV 2024 论文和开源项目合集,同时欢迎各位大佬提交issue,分享ECCV 2024论文和开源项目
ErgastiAlex/MARS
happylinze/multi-modal-tsm
The official repository for the Multi-Modal Visual Pattern Recognition Challenge-Track3 Baseline (ICPR 2024)
XU-TIANYANG/ZOO145
145 video sequences with 20 categories from the zoo, forming a new test set for animal tracking, coined ZOO145
taosdata/TDengine
High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios
zengwang430521/TCFormer
The codes for TCFormer in paper: Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
ArrowLuo/CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
jihoo-kim/RecSys-Papers-from-SIGIR-2021
Papers related to the Recommender System from SIGIR 2021 (including the links for Paper PDF, Github Code and Dataset)
CryhanFang/CLIP2Video
starmemda/CAMoE
dair-ai/ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
JingweiJ/ActionGenome
A video database bridging human actions and human-object relationships
yrcong/STTran
Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
jayleicn/ClipBERT
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
m-bain/frozen-in-time
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
XU-TIANYANG/LSDCF
Learning Low-rank and Sparse Discriminative Correlation Filters for Coarse-to-Fine Visual Object Tracking
m-bain/webvid
Large-scale text-video dataset. 10 million captioned short videos.
CRIPAC-DIG/GHRM
[WWW 2021] Source code and datasets for the paper "Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval".
HuiChen24/IMRAM
code for our CVPR2020 paper "IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval"
kuanghuei/SCAN
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
divelab/DeeperGNN
Official PyTorch implementation of "Towards Deeper Graph Neural Networks" [KDD2020]