In this project, we apply CNN (Convolutional Neural Network) to speech recognition tasks.
CNN is a very versatile technique mainly for Computer Vision tasks. It has significantly improved image classification and object detection accuracy. But in the domain of Automatic Speech Recognition (ASR), although networks such as RNN and ANN are being incorporated into many speech recognition models, CNN did not play a significant part.
The idea is to convert the speeches into an images and then train the CNN to recognize words in these images. The visualization part we convert audio data into spectrogram, which is a efficient visualization of audio.