csteinmetz1/pyloudnorm

loudness normalization using EBU-r128

Opened this issue · 2 comments

Hi!

Have you considered adding EBU-r128 normalization?

E.g. similar to the implementation below which however needs ffmpeg as a dependency?
https://github.com/slhck/ffmpeg-normalize#ebu-r128-normalization

Hi @oplatek,

EBU R128 uses BS.1770 as the algorithm for normalization. Using pyloudnorm should produce very similar results to ffmpeg.

Did you have a specific use case in mind? Currently pyloudnorm only measures integrated loudness but EBU R128 also includes short-term and momentary loudness. Was that what you were referring to?

My use-case is comparing relatively short Text-to-Speech (TTS) or Voice Converted (VC) samples converted between source speaker & condition to clean target speaker voice.

The samples are typically 2-14s long, with length normally distributed.
I noticed that RMS is sensitive to background noise e.g. for VC from noisy conditions to clean target conditions.
And as I want to compare side by side noisy and clean utterances I want them to be normalized to the same perceived loudness.

In general, I think that peak loudness normalization is the best.
I asked about EBU R 128 normalization because some other studies used it and it also uses peak normalization.