johnmartinsson/bird-species-classification

Same Class and Noise Addition

Closed this issue · 1 comments

Implement additive data augmentation methods. These will be used to encourage the network to see more combined species samples, and to see more variations of noise/structure.

Methods:

  • load multiple random noise samples
  • load multiple random signal samples of the same class
  • additively combine multiple samples of time-frequency data

Same Class

"We follow [14] and add sound files that correspond to the same class. Adding is a simple process because each sound file can be represented by a single vector. If one of the sound files is shorter than the other we repeat the shorter one as many times as it is necessary. After adding two sound files, we re-normalize the result to preserve the original maximum amplitude of the sound files. The operation describes the effect of multiple birds (of the same species) singing at the same time. Adding files improves convergence because the neural network sees more important patterns at once, we also found a slight increase in the accuracy of the system (see Table 1)." (Sprengel et al, 2016)

Adding Noise

"One of the most important augmentation steps is to add background noise. In Section 2.1 we described how we split each file into a signal and noise part. For every signal sample we can choose an arbitrary noise sample (since the background noise should be independent of the class label) and add it on top of the original training sample at hand. As for combining same class audio files, this operation should be done in the time domain by adding both sound files and repeating the smaller one as often as necessary. We can even add multiple noise samples. In our test we found that three noise samples added on top of the signal, each with a dampening factor of 0.4 produces the best results. This means that, given enough training time, for a single training sample we eventually add every possible background noise which decreases the generalization error." (Sprengel et al, 2016)

The methods for this is now implemented. It has not been thoroughtly tested, but by observing the resulting augmented signal with the signals used to augment it, it seems to be working just fine.