Neural Network Audio Reconstruction

These are a variety of ideas about audio signal reconstruction using neural networks. Of course this generalizes to pretty much any time series, nothing special about audio. The general idea is that data that is known is used as a label, and feature data is generated by various means of either reducing the information content of the label or by adding noise to the label. The goal, of course, is that the networks learns how to fill in the data, remove noise, etc.

The original idea of this is based on my predilection for the Wadia digital to analog converters. These use relatively high order polynomial interpolation to smooth sampled audio data. Wadia spent a lot of effort identifying which algorithms produced results that study participants thought sounded better. The inspiration for this collection of audio reconstruction experiments is based on the ideas that Wadia was doing decades ago, but tempered with the amazing neural network visual reconstruction examples that we have today (Alex Champandard's Neural Enhance being just one of many examples).

These are just a number of experiments in convenient Jupyter notebooks, nothing is a finished product. You will need numpy and tensorflow to run them. The basic notion of the models is the encoder-decoder, but of course this can be broadened to things like the u-net. The generators are quite slow, so don't expect amazing performance, even on your GPU, since this is not as memory-bound a problem as many training exercises. In order to be performant, the naive implementation that I first wrote using numpy will need to be changed to using native tensorflow routines through keras.

The first experiment, NoiseReduction.ipynb, adds two types of noise to a curated signal. Each sample feature has randomly generated properties, which include frequency and noise parameters. The signal composition is single sinusoid with an integer number of cycles in the size of the sample. The amplitude is a random variable as well. Gaussian noise is added with random variance per waveform. Combined with the fact that the sinusoid amplitude is randomly selected, we have random selection of signal-to-noise ratio. There is also random spurious noise added. This noise is a single sample that has a fixed amplitude and a 50% chance of being above (or below) the sample. This noise has a probability of occuring, so could be considered a Poisson process. The network is trained using mean squared error and an Adam optimizer.

The second experiment, Inverse Quantization, is similar to the Noise Reduction experiment except that instead of adding noise to a target signal, it quantizes it in time. That is to say, the feature signal is quantized in time and is a step approximation of the target signal. The goal of course is the same, to train the network to reproduce the target signal when subjected to the quantized feature. Again, the network is trained using mean squared error and an Adam optimizer.

Future Plans

What I want to do with this project is get to the point that I can train the audio mappings so that they do something useful. Also, there is the issue of how to frame sequential runs. There is nothing providing any guarantee that the mechanism of mapping noisy or quanitized data to the underlying ground truth data would abide any normal property related to continuity when applied across frames. This is a very concerning issue that still needs to be resolved.

Related Work

Kuleshov, Enam and Ermon published an article entitled "Audio Super Resolution using Neural Networks" after I started work on this that is very interesting. Please find the article here. They also have a repo associated with this work that can be found here.

ColinShaw/python-neural-network-audio-reconstruction

Neural Network Audio Reconstruction

Future Plans

Related Work