visually-grounded-speech

There are 6 repositories under visually-grounded-speech topic.

atosystem/SpeechCLIP
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022
Language:Python109 5 96
bhigy/zr-2021vg_baseline
Baselines for the Zero-Resources Speech Challenge using VisuallyGrounded Models of Spoken Language, 2021 edition
Language:Python7 6 02
spokenlanguage/platalea
Library for training visually-grounded models of spoken language understanding.
Language:Python3 1 721
bhigy/textual-supervision
Code for the paper "Textual supervision for visually grounded spoken language understanding".
Language:Python2 2 00
ShampooWang/SpeechCLIP_plus
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data. Accepted to ICASSP 2024, Self-supervision in Audio, Speech, and Beyond (SASB) workshop.
Language:Python20
aayushi12/thesis_dss
Code used in my Master's thesis
Language:Python00