csteinmetz1/pyloudnorm

Requirements for input data

Mistobaan opened this issue · 3 comments

What are the requirements besides the one in valid_audio for a numpy array input?

  • Float32?
  • Already unit normalized (-1.0 <= y <= 1.0 ) ?
  • ... other ?

pyloudnorm assumes the data is a float of some kind (I'm under the assumption that float32 or float64 would be valid choices) that is scaled -1.0 to 1.0, like you mention. If you try to measure the loudness of an array thats in an int format you'll quickly realize there is an error. All internal math is handled by numpy using ndarrays.

Are you facing any certain kind of issue loading data? I have had success using pysoundfile to load audio from wav files, as it automatically performs the int to float conversion. I've used pydub for opening encoded audio files like mp3, although I have experimented with that somewhat less with pyloudnorm. Let me know if you have any issues, if not I can close this issue.

I think some asserts in the input would be a nice feature to have.
I am having issues with the loudness_normalization function as it returns values greater than +/-1.0 with the default -12 input size. Feels like there is some constraint to be met in the delta between the previous loudness and the desired loudness.

I use 16bit wav files and load them using scipy.io.wavfile.read. Convert them to float, normalize by the range so is between -1.0 and 1.0, compute the loudness and then pass it through the loud_norm.

You may be on an older version as the loudness_normalization() function isn't there anymore and has been replaced with the peak() and loudness() functions in normalize.py. Those functions will check for any clipped samples and will throw a warning if so.

What do you mean by "the default -12 input size"? Also, could you explain what asserts you think would be beneficial to include? Appreciate your input.

When normalizing to some target LUFS level it is possible that the output signal might clip (exceed +/- 1.0). This would depend on the dynamic range of the content. For example, a recording that had a very small amplitude for a long period of time, but then had a short, loud impulsive sound would clip if we adjusted the integrated loudness of the whole recording to be very high (something like -6 dB LUFS).

Edit: I wanted to add that in the example above, in order to reach the target LUFS without clipping you would need to use some kind of non-linear transformation (distortion, compression, etc.) It can't be achieved with a simple gain adjustment.