
Experiment with augmenting a higher percentage of the dataset

Opened this issue · 3 comments

pokey commented

Today, it appears that augmentation is only occurring 10% of the time

if ( and random.uniform(0, 1) >= 0.9 ):
if (self.augmented_samples[idx] is None):
self.augmented_samples[idx] = [self.samples[idx][0], self.samples[idx][1], torch.tensor(self.feature_engineering_augmented(self.samples[idx][0])).float()]
return self.augmented_samples[idx][2], self.augmented_samples[idx][1]

I have a suggestion in a related vein. I think we can simplify the structure of the code here and do away with the random.uniform(0, 1) >= 0.9. This is because the thing that's actually doing the augmentation --- the thing that ends up being called in turn by self.feature_engineering_augmented --- is actually a probabilistic augmenter transform. That is, part of the code for augmented_feature_engineering (which is what gets called by self.feature_engineering_augmented) looks like this:

def augmented_feature_engineering( wavFile, settings ):
    fs, rawWav = wavFile )
    wavData = rawWav
   # <some stuff that I haven't included>
    augmenter = Compose([
        AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
        TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
        Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
    wavData = augmenter(samples=np.array(wavData, dtype="float32"), sample_rate=fs)

The p formal parameter for transforms like AddGaussianNoise, TimeStretch and Shift is the probability that that transform will get applied (see, e.g., and iver56/audiomentations#168). So, what is currently happening is that

  • the probability of no augmentation at all for an arbitrary training sample $=0.9 + 0.1 * (0.5)^3 = 0.9125$
  • Or, the probability of there being at least one augmentation for an arbitrary training sample = $1 - 0.9125 = 0.0875$.

(I hope I haven't got the math wrong --- please correct me if I did.)

This structure of this code can thus be simplified as follows. Let $q$ be the probability that no augmentation will be done; this will be a hyperparameter that we control. And let $t$ be the number of augmenter transforms we're using (in the code above, this is 3). Since the augmenter transforms already come with a formal probability parameter $p$, we do not need the equivalent of random.uniform(0, 1) >= 0.9. Instead, we can just set $p$ for the transforms based on the value we want for $q$ via $(1 - p)^{t} = q \iff p = 1 - \sqrt[t]{q}$, assuming we use the same $p$ for all the augmenter transforms. We can then treat $q$ as a hyperparameter that we can experiment with and tune (as per pokey's suggestion).

pokey commented

The only thing to keep in mind here is that for non-augmented datapoints, the feature values are cached, so I believe your approach might result in a performance penalty.

Tho tbh I'm guessing that doing more augmentation will outweigh the costs, but just a note

I'm not sure if this helps with making sure the caching works for the non-agumented data points, but we could also set p to 1 for the transforms, and adjust the amount of augmentation by keeping random.uniform(0, 1) >= some param.

My main thought is just that stacking a probabilistic thing on top of another probabilistic thing makes things harder to reason about --- it would be clearer if we either removed the random.uniform stuff or made the transforms deterministic by setting p = 1.