- A Speech recognition deep learning model to create and add subtitles to any English-language Video.
- Aligns the text timing with the video frames.
- Reaches a 1.8/3.3 word error rate using Wav2vec 2.0 Transformer.
yelnady/Speech-Captioner-for-English-Videos
Speech recognition deep learning model to create and add subtitles to any English-language Video.
CSS