- This repo is made for updating GSoC 2024 project proposal with some helper codes that I for the project proposed.
- Please download relevant data of audioset from the link
- Download samples of People's Speech Dataset from here link
- Note that you can also use utils.ipynb as a scrip to download these datasets.
- Other tools that I have tried for the proposal include - yt-dlp, shot_detection, video_llava
kolubex/Visual_Aware_E2E_ST
This repo is made for the GSoC2024 proposal for the project Visual Aware E2E Speech Transcription listed by Red Hen Lab.
Jupyter NotebookMIT