From spectrogram to meow.

Using neural networks on raw audio waveforms is hard, because they contain so many datapoints (tens of thousands per second). Therefore, we usually transform the waveform to an image - the spectrogram. The spectrogram is then used as input for the neural network to process sound. However, it's somewhat tricky to again recover an actual sound waveform from a spectrogram because some information is lost in the transformation process. In neural_vocoder.ipynb, I'll train a convolutional neural network (CNN) to infer cat meow waveforms from spectrograms. The aim is to use a simpler architecture than cutting edge models, whilst still giving decent results.

Dall-e prompt "A stern looking cat wearing a futuristic glowing waveform".

mastoffel/neural_vocoder

From spectrogram to meow.