Audio Content Analysis

Analyzing Pronounciation, Phonetics and Transcript using Wav2Vec2 Automatic Speech Recognition hosted on Huggingface using Gradio.

Wav2Vec2 Automatic Speech Recognition: The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli (https://arxiv.org/abs/2006.11477). Here, the Wav2Vec2 Automatic Speech Recognition was used to perform live speech recognition and dewtection of the Phonetic structure of the sentence spoken, Stutter count and Pronounciation Analysis of the sentence Spoken. This model hosted in Huggingface as a space is designed to perform analysis of Audio and Language.

The Repository also contains code for a web hosting using Flask API. Due to the high memory usage of Wav2Vec2 Model used, Huggingface and Gradio were opted instead of Heroku or any such alternatives for deployment.

The Video demonstration of the working of this project: https://youtu.be/uPhmnVbzSuQ

Link to the Space: https://huggingface.co/spaces/bharathraj-v/audio-content-analysis