roudimit

PhD Student at MIT CSAIL

Massachusetts Institute of TechnologyCambridge, Massachusetts

Pinned Repositories

AVLnet
Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.
Language:Python49 1 16
awesome-video-text-retrieval
A curated list of deep learning resources for video-text retrieval.
0 0 00
c2kd
Code for the C2KD paper (ICASSP 2023)
Language:Python17 1 01
everything_at_once
Implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval" (CVPR 2022)
Language:Python0 0 00
MIL-NCE_HowTo100M
PyTorch GPU distributed training code for MIL-NCE HowTo100M
Language:Python0 0 00
MIT-6.058-Notebook
Notebooks collection I made for the MIT IAP Signals and Systems class
Language:Jupyter Notebook0 0 00
MUSIC_dataset
MUSIC Dataset from The Sound of Pixels (ECCV '18)
119 9 1026
Sound-of-Pixels
Codebase for ECCV18 "The Sound of Pixels"
Language:Python1 0 00
video_feature_extractor
Easy to use video deep features extractor
Language:Python1 0 02
whisper-flamingo
[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Language:Jupyter Notebook95 3 36

roudimit/MUSIC_dataset
MUSIC Dataset from The Sound of Pixels (ECCV '18)
119 9 1026
roudimit/whisper-flamingo
[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Language:Jupyter Notebook95 3 36
roudimit/AVLnet
Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.
Language:Python49 1 16
roudimit/c2kd
Code for the C2KD paper (ICASSP 2023)
Language:Python17 1 01
roudimit/Sound-of-Pixels
Codebase for ECCV18 "The Sound of Pixels"
Language:Python1 0 00
roudimit/video_feature_extractor
Easy to use video deep features extractor
Language:Python1 0 02
roudimit/awesome-video-text-retrieval
A curated list of deep learning resources for video-text retrieval.
0 0 00
roudimit/everything_at_once
Implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval" (CVPR 2022)
Language:Python0 0 00
roudimit/MIL-NCE_HowTo100M
PyTorch GPU distributed training code for MIL-NCE HowTo100M
Language:Python0 0 00
roudimit/MIT-6.058-Notebook
Notebooks collection I made for the MIT IAP Signals and Systems class
Language:Jupyter Notebook0 0 00
roudimit/ResDAVEnet-VQ
Language:Jupyter Notebook0 1 00
roudimit/Spoken-ObjectNet
Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset (Interspeech 2021)
Language:Shell0 0 00