visually-grounded-speech

There are 6 repositories under visually-grounded-speech topic.

  • atosystem/SpeechCLIP

    SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022

    Language:Python109596
  • bhigy/zr-2021vg_baseline

    Baselines for the Zero-Resources Speech Challenge using VisuallyGrounded Models of Spoken Language, 2021 edition

    Language:Python7602
  • spokenlanguage/platalea

    Library for training visually-grounded models of spoken language understanding.

    Language:Python31721
  • bhigy/textual-supervision

    Code for the paper "Textual supervision for visually grounded spoken language understanding".

    Language:Python2200
  • ShampooWang/SpeechCLIP_plus

    SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data. Accepted to ICASSP 2024, Self-supervision in Audio, Speech, and Beyond (SASB) workshop.

    Language:Python20
  • aayushi12/thesis_dss

    Code used in my Master's thesis

    Language:Python00