cross-modal
There are 50 repositories under cross-modal topic.
jina-ai/discoart
🪩 Create Disco Diffusion artworks in one line
docarray/docarray
Represent, send, store and search multimodal data
shaoxiongji/knowledge-graphs
A collection of research on knowledge graphs
krantiparida/awesome-audio-visual
A curated list of different papers and datasets in various areas of audio-visual processing
kuanghuei/SCAN
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
towhee-io/examples
Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc.
JizhiziLi/RIM
[CVPR 2023] Referring Image Matting
haihuangcode/CMG
The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)
yisun98/SOLC
Remote Sensing Sar-Optical Land-use Classfication Pytorch Pytorch高分辨率遥感语义分割/地物分割/地物分类
DRSY/MoTIS
[NAACL 2022]Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)
Zengyi-Qin/Weakly-Supervised-3D-Object-Detection
Weakly Supervised 3D Object Detection from Point Clouds (VS3D), ACM MM 2020
QizhiPei/BioT5
BioT5 (EMNLP 2023) and BioT5+ (ACL 2024 Findings)
qcraftai/distill-bev
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)
yangli18/VLTVG
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
rohitrango/objects-that-sound
Unofficial Implementation of Google Deepmind's paper `Objects that Sound`
kywen1119/DSRAN
Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.
marslanm/Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
Paranioar/UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
GT-RIPL/Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
Eaphan/UPIDet
Unleash the Potential of Image Branch for Cross-modal 3D Object Detection [NeurIPS2023]
zjukg/DUET
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
zerovl/ZeroVL
[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources
mako443/Text2Pos-CVPR2022
Code, dataset and models for our CVPR 2022 publication "Text2Pos"
caoyue10/aaai17-cdq
The implementation of AAAI-17 paper "Collective Deep Quantization of Efficient Cross-modal Retrieval"
smallflyingpig/speech-to-image-translation-without-text
Code for paper "direct speech-to-image translation"
catalina17/XFlow
Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)
mesnico/ALADIN
Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"
YangLiu9208/SAKDN
[IEEE T-IP 2021] Semantics-aware Adaptive Knowledge Distillation for Cross-modal Action Recognition
yolo2233/cross-modal-hasing-playground
Python implementation of cross-modal hashing algorithms
bitreidgroup/DSCNet
DSCNet Visible-Infrared Person ReID (TIFS 2022)
Viresh-R/ml-CCA
Implementation of Fast ml-CCA from the ICCV-2015 work "Multi-Label Cross-Modal Retrieval"
ovshake/cobra
Code for COBRA: Contrastive Bi-Modal Representation Algorithm (https://arxiv.org/abs/2005.03687)
sarahESL/AlignCLIP
AlignCLIP: Improving Cross-Modal Alignment in CLIP
Annusha/xmic
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization, CVPR 2024
PetarV-/X-CNN
Cross-modal convolutional neural networks
CLT29/semantic_neighborhoods
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]