cross-modal

There are 50 repositories under cross-modal topic.

  • discoart

    jina-ai/discoart

    🪩 Create Disco Diffusion artworks in one line

    Language:Python3.8k34107248
  • docarray

    docarray/docarray

    Represent, send, store and search multimodal data

    Language:Python3k46639232
  • shaoxiongji/knowledge-graphs

    A collection of research on knowledge graphs

    Language:JavaScript1.7k636293
  • krantiparida/awesome-audio-visual

    A curated list of different papers and datasets in various areas of audio-visual processing

  • kuanghuei/SCAN

    PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

    Language:Python5551061113
  • towhee-io/examples

    Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc.

    Language:Jupyter Notebook471781115
  • JizhiziLi/RIM

    [CVPR 2023] Referring Image Matting

  • haihuangcode/CMG

    The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)

    Language:Python2043166
  • yisun98/SOLC

    Remote Sensing Sar-Optical Land-use Classfication Pytorch Pytorch高分辨率遥感语义分割/地物分割/地物分类

    Language:Python19422226
  • DRSY/MoTIS

    [NAACL 2022]Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

    Language:Swift1234710
  • Zengyi-Qin/Weakly-Supervised-3D-Object-Detection

    Weakly Supervised 3D Object Detection from Point Clouds (VS3D), ACM MM 2020

    Language:Jupyter Notebook1067618
  • QizhiPei/BioT5

    BioT5 (EMNLP 2023) and BioT5+ (ACL 2024 Findings)

    Language:Python1013135
  • qcraftai/distill-bev

    DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)

    Language:Python945186
  • yangli18/VLTVG

    Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022

    Language:Python942218
  • rohitrango/objects-that-sound

    Unofficial Implementation of Google Deepmind's paper `Objects that Sound`

    Language:Python835816
  • kywen1119/DSRAN

    Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.

    Language:Python7241512
  • marslanm/Multimodality-Representation-Learning

    This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

  • Paranioar/UniPT

    [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"

    Language:Python66151
  • GT-RIPL/Xmodal-Ctx

    Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

    Language:Python6021110
  • Eaphan/UPIDet

    Unleash the Potential of Image Branch for Cross-modal 3D Object Detection [NeurIPS2023]

    Language:Python576227
  • zjukg/DUET

    [Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

    Language:Python49438
  • zerovl/ZeroVL

    [ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources

    Language:Python45355
  • mako443/Text2Pos-CVPR2022

    Code, dataset and models for our CVPR 2022 publication "Text2Pos"

    Language:Python423126
  • caoyue10/aaai17-cdq

    The implementation of AAAI-17 paper "Collective Deep Quantization of Efficient Cross-modal Retrieval"

    Language:Python352124
  • smallflyingpig/speech-to-image-translation-without-text

    Code for paper "direct speech-to-image translation"

    Language:Python27306
  • catalina17/XFlow

    Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)

    Language:Python25333
  • mesnico/ALADIN

    Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"

    Language:Python23535
  • YangLiu9208/SAKDN

    [IEEE T-IP 2021] Semantics-aware Adaptive Knowledge Distillation for Cross-modal Action Recognition

    Language:Python23453
  • yolo2233/cross-modal-hasing-playground

    Python implementation of cross-modal hashing algorithms

    Language:Python22333
  • bitreidgroup/DSCNet

    DSCNet Visible-Infrared Person ReID (TIFS 2022)

    Language:Python21133
  • Viresh-R/ml-CCA

    Implementation of Fast ml-CCA from the ICCV-2015 work "Multi-Label Cross-Modal Retrieval"

    Language:Matlab21103
  • ovshake/cobra

    Code for COBRA: Contrastive Bi-Modal Representation Algorithm (https://arxiv.org/abs/2005.03687)

    Language:Python15253
  • sarahESL/AlignCLIP

    AlignCLIP: Improving Cross-Modal Alignment in CLIP

    Language:Python13310
  • Annusha/xmic

    X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization, CVPR 2024

    Language:Python11330
  • PetarV-/X-CNN

    Cross-modal convolutional neural networks

    Language:Python11719
  • CLT29/semantic_neighborhoods

    Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]

    Language:Python9316