cross-modal

There are 50 repositories under cross-modal topic.

jina-ai/discoart
🪩 Create Disco Diffusion artworks in one line
Language:Python3.8k 34 107248
docarray/docarray
Represent, send, store and search multimodal data
Language:Python3k 46 639232
shaoxiongji/knowledge-graphs
A collection of research on knowledge graphs
Language:JavaScript1.7k 63 6293
krantiparida/awesome-audio-visual
A curated list of different papers and datasets in various areas of audio-visual processing
686 18 268
kuanghuei/SCAN
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
Language:Python555 10 61113
towhee-io/examples
Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc.
Language:Jupyter Notebook471 7 81115
JizhiziLi/RIM
[CVPR 2023] Referring Image Matting
206 23 413
haihuangcode/CMG
The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)
Language:Python204 3 166
yisun98/SOLC
Remote Sensing Sar-Optical Land-use Classfication Pytorch Pytorch高分辨率遥感语义分割/地物分割/地物分类
Language:Python194 2 2226
DRSY/MoTIS
[NAACL 2022]Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)
Language:Swift123 4 710
Zengyi-Qin/Weakly-Supervised-3D-Object-Detection
Weakly Supervised 3D Object Detection from Point Clouds (VS3D), ACM MM 2020
Language:Jupyter Notebook106 7 618
QizhiPei/BioT5
BioT5 (EMNLP 2023) and BioT5+ (ACL 2024 Findings)
Language:Python101 3 135
qcraftai/distill-bev
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)
Language:Python94 5 186
yangli18/VLTVG
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
Language:Python94 2 218
rohitrango/objects-that-sound
Unofficial Implementation of Google Deepmind's paper `Objects that Sound`
Language:Python83 5 816
kywen1119/DSRAN
Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.
Language:Python72 4 1512
marslanm/Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
70 8 07
Paranioar/UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
Language:Python66 1 51
GT-RIPL/Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
Language:Python60 2 1110
Eaphan/UPIDet
Unleash the Potential of Image Branch for Cross-modal 3D Object Detection [NeurIPS2023]
Language:Python57 6 227
zjukg/DUET
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
Language:Python49 4 38
zerovl/ZeroVL
[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources
Language:Python45 3 55
mako443/Text2Pos-CVPR2022
Code, dataset and models for our CVPR 2022 publication "Text2Pos"
Language:Python42 3 126
caoyue10/aaai17-cdq
The implementation of AAAI-17 paper "Collective Deep Quantization of Efficient Cross-modal Retrieval"
Language:Python35 2 124
smallflyingpig/speech-to-image-translation-without-text
Code for paper "direct speech-to-image translation"
Language:Python27 3 06
catalina17/XFlow
Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)
Language:Python25 3 33
mesnico/ALADIN
Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"
Language:Python23 5 35
YangLiu9208/SAKDN
[IEEE T-IP 2021] Semantics-aware Adaptive Knowledge Distillation for Cross-modal Action Recognition
Language:Python23 4 53
yolo2233/cross-modal-hasing-playground
Python implementation of cross-modal hashing algorithms
Language:Python22 3 33
bitreidgroup/DSCNet
DSCNet Visible-Infrared Person ReID (TIFS 2022)
Language:Python21 1 33
Viresh-R/ml-CCA
Implementation of Fast ml-CCA from the ICCV-2015 work "Multi-Label Cross-Modal Retrieval"
Language:Matlab21 1 03
ovshake/cobra
Code for COBRA: Contrastive Bi-Modal Representation Algorithm (https://arxiv.org/abs/2005.03687)
Language:Python15 2 53
sarahESL/AlignCLIP
AlignCLIP: Improving Cross-Modal Alignment in CLIP
Language:Python13 3 10
Annusha/xmic
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization, CVPR 2024
Language:Python11 3 30
PetarV-/X-CNN
Cross-modal convolutional neural networks
Language:Python11 7 19
CLT29/semantic_neighborhoods
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
Language:Python9 3 16

cross-modal

jina-ai/discoart

docarray/docarray

shaoxiongji/knowledge-graphs

krantiparida/awesome-audio-visual

kuanghuei/SCAN

towhee-io/examples

JizhiziLi/RIM

haihuangcode/CMG

yisun98/SOLC

DRSY/MoTIS

Zengyi-Qin/Weakly-Supervised-3D-Object-Detection

QizhiPei/BioT5

qcraftai/distill-bev

yangli18/VLTVG

rohitrango/objects-that-sound

kywen1119/DSRAN

marslanm/Multimodality-Representation-Learning

Paranioar/UniPT

GT-RIPL/Xmodal-Ctx

Eaphan/UPIDet

zjukg/DUET

zerovl/ZeroVL

mako443/Text2Pos-CVPR2022

caoyue10/aaai17-cdq

smallflyingpig/speech-to-image-translation-without-text

catalina17/XFlow

mesnico/ALADIN

YangLiu9208/SAKDN

yolo2233/cross-modal-hasing-playground

bitreidgroup/DSCNet

Viresh-R/ml-CCA

ovshake/cobra

sarahESL/AlignCLIP

Annusha/xmic

PetarV-/X-CNN

CLT29/semantic_neighborhoods