Pinned Repositories
AVLnet
Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.
awesome-video-text-retrieval
A curated list of deep learning resources for video-text retrieval.
c2kd
Code for the C2KD paper (ICASSP 2023)
everything_at_once
Implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval" (CVPR 2022)
MIL-NCE_HowTo100M
PyTorch GPU distributed training code for MIL-NCE HowTo100M
MIT-6.058-Notebook
Notebooks collection I made for the MIT IAP Signals and Systems class
MUSIC_dataset
MUSIC Dataset from The Sound of Pixels (ECCV '18)
Sound-of-Pixels
Codebase for ECCV18 "The Sound of Pixels"
video_feature_extractor
Easy to use video deep features extractor
whisper-flamingo
[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
roudimit's Repositories
roudimit/MUSIC_dataset
MUSIC Dataset from The Sound of Pixels (ECCV '18)
roudimit/whisper-flamingo
[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
roudimit/AVLnet
Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.
roudimit/c2kd
Code for the C2KD paper (ICASSP 2023)
roudimit/Sound-of-Pixels
Codebase for ECCV18 "The Sound of Pixels"
roudimit/video_feature_extractor
Easy to use video deep features extractor
roudimit/awesome-video-text-retrieval
A curated list of deep learning resources for video-text retrieval.
roudimit/everything_at_once
Implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval" (CVPR 2022)
roudimit/MIL-NCE_HowTo100M
PyTorch GPU distributed training code for MIL-NCE HowTo100M
roudimit/MIT-6.058-Notebook
Notebooks collection I made for the MIT IAP Signals and Systems class
roudimit/ResDAVEnet-VQ
roudimit/Spoken-ObjectNet
Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset (Interspeech 2021)