Experiment with augmenting a higher percentage of the dataset
Opened this issue · 3 comments
Today, it appears that augmentation is only occurring 10% of the time
parrot.py/lib/audio_dataset.py
Lines 69 to 72 in 5b57d12
I have a suggestion in a related vein. I think we can simplify the structure of the code here and do away with the random.uniform(0, 1) >= 0.9
. This is because the thing that's actually doing the augmentation --- the thing that ends up being called in turn by self.feature_engineering_augmented
--- is actually a probabilistic augmenter transform. That is, part of the code for augmented_feature_engineering
(which is what gets called by self.feature_engineering_augmented
) looks like this:
def augmented_feature_engineering( wavFile, settings ):
fs, rawWav = scipy.io.wavfile.read( wavFile )
wavData = rawWav
# <some stuff that I haven't included>
augmenter = Compose([
AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
])
wavData = augmenter(samples=np.array(wavData, dtype="float32"), sample_rate=fs)
The p
formal parameter for transforms like AddGaussianNoise
, TimeStretch
and Shift
is the probability that that transform will get applied (see, e.g., https://iver56.github.io/audiomentations/waveform_transforms/add_gaussian_noise/ and iver56/audiomentations#168). So, what is currently happening is that
- the probability of no augmentation at all for an arbitrary training sample
$=0.9 + 0.1 * (0.5)^3 = 0.9125$ - Or, the probability of there being at least one augmentation for an arbitrary training sample =
$1 - 0.9125 = 0.0875$ .
(I hope I haven't got the math wrong --- please correct me if I did.)
This structure of this code can thus be simplified as follows. Let random.uniform(0, 1) >= 0.9
. Instead, we can just set
The only thing to keep in mind here is that for non-augmented datapoints, the feature values are cached, so I believe your approach might result in a performance penalty.
Tho tbh I'm guessing that doing more augmentation will outweigh the costs, but just a note
I'm not sure if this helps with making sure the caching works for the non-agumented data points, but we could also set p to 1 for the transforms, and adjust the amount of augmentation by keeping random.uniform(0, 1) >= some param
.
My main thought is just that stacking a probabilistic thing on top of another probabilistic thing makes things harder to reason about --- it would be clearer if we either removed the random.uniform stuff or made the transforms deterministic by setting p = 1.