visually-grounded-speech
There are 6 repositories under visually-grounded-speech topic.
atosystem/SpeechCLIP
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022
bhigy/zr-2021vg_baseline
Baselines for the Zero-Resources Speech Challenge using VisuallyGrounded Models of Spoken Language, 2021 edition
spokenlanguage/platalea
Library for training visually-grounded models of spoken language understanding.
bhigy/textual-supervision
Code for the paper "Textual supervision for visually grounded spoken language understanding".
ShampooWang/SpeechCLIP_plus
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data. Accepted to ICASSP 2024, Self-supervision in Audio, Speech, and Beyond (SASB) workshop.
aayushi12/thesis_dss
Code used in my Master's thesis