Speech-Captioner-for-English-Videos

A Speech recognition deep learning model to create and add subtitles to any English-language Video.
Aligns the text timing with the video frames.
Reaches a 1.8/3.3 word error rate using Wav2vec 2.0 Transformer.

yelnady/Speech-Captioner-for-English-Videos