azul3d-legacy/audio

WAV decoder performance

slimsag opened this issue · 13 comments

I've added a benchmark to the audio/wav decoder (590de9cb01f017e31c5f798028fef1920dcf36a2) and using it pprof shows that 76% of the time is spent during binary.Read operations (it uses reflect, so most time is spent allocating slices with runtime.makeslice).

I think this issue can be solved by buffering the data into a slice, and then performing unsafe conversions where possible.

@mewmew I've been thinking about the best way to go forward with the audio package and friends. It got me thinking that we need audio codecs independent of their file (container formats):

  • azul3d.org/audio/codec/pcm.v1 - could handle fast translation of binary data to PCM8 / PCM16 / PCM24 / PCM32 / F32 / F64 / MuLaw / ALaw) and back. The .wav file decoder (azul3d.org/audio/wav.v1) would utilize this package.
  • azul3d.org/audio/codec/flac.v1 - could handle decoding and encoding of FLAC streams (they aren't tied to OGG files). Then the .flac decoder (azul3d.org/audio/flac.v1) would utilize this package.

Based on my (very rough) benchmarking for mdlayher/waveform, any improvement to audio decoding would be hugely helpful. It is definitely a pretty lengthy process, and if optimizations can be made (removing binary.Read) at the expense of a few more lines of code, it would almost certainly be worth the trouble.

It's great that you are adding benchmarks. I feel a bit uneasy about using unsafe. If we can't find any other way to find similar performance improvements without using unsafe, and the improvements are substantial I guess it cannot be helped. At the very least we should make sure not to do too much magic with unsafe; basically only using it to avoid duplicate allocations.

azul3d.org/audio/codec/flac.v1 - could handle decoding and encoding of FLAC streams (they aren't tied to OGG files). Then the .flac decoder (azul3d.org/audio/flac.v1) would utilize this package.

The audio sample decoding is already performed by the frame package, metadata decoding is performed by the meta, and the .flac file container format is decoded by the flac package.

It would be trivial to implement a front-end (azul3d.org/audio/codec/flac.v1) for the audio sample decoding, using the frame package as a backend.

I've added more detailed benchmarks (azul3d-legacy/audio-wav@471a994), and improved the tests for the wav decoder. The benchmarks below show the time for decoding an entire 11s wav file:

BenchmarkDecodeFloat32        30     533053090 ns/op
BenchmarkDecodeFloat64        30     538004120 ns/op
BenchmarkDecodeUint8          20     570293048 ns/op
BenchmarkDecodeInt16          20     563579666 ns/op
BenchmarkDecodeInt24          30     527503099 ns/op
BenchmarkDecodeInt32          30     524333476 ns/op
BenchmarkDecodeALaw       20     564546919 ns/op
BenchmarkDecodeMuLaw          20     574923546 ns/op

Maybe we will want to use shorter (one second?) wav files for the benchmarks since they must be measured in ns/op. FWIW 500000000ns == 0.5s, so we are able to decode an entire 11s wav file right now in roughly 0.5s total (this is on a rather old Pentium dual core laptop).

@mewmew

I feel a bit uneasy about using unsafe. If we can't find any other way to find similar performance improvements without using unsafe, and the improvements are substantial I guess it cannot be helped. At the very least we should make sure not to do too much magic with unsafe; basically only using it to avoid duplicate allocations.

Right. I feel this way too. I like clean, pure, and idiomatic Go code so I think we should only use unsafe if it:

  1. Gives us a significant performance boost.
  2. Doesn't require too much unsafe code.
  3. Can be well documented and easily understood.
  4. Most important: Is impossible do with plain Go code.

I will investigate further.

The audio sample decoding is already performed by the frame package, metadata decoding is performed by the meta, and the .flac file container format is decoded by the flac package.

It would be trivial to implement a front-end (azul3d.org/audio/codec/flac.v1) for the audio sample decoding, using the frame package as a backend.

Yeah, I had this exact idea in mind. I think the whole idea of azul3d.org/audio/codec needs more thought though. I think we shouldn't do it unless someone would use it (i.e. if someone was implementing a video decoder or something and wanted to use the flac decoder).

@mdlayher

Based on my (very rough) benchmarking for mdlayher/waveform, any improvement to audio decoding would be hugely helpful. It is definitely a pretty lengthy process, and if optimizations can be made (removing binary.Read) at the expense of a few more lines of code, it would almost certainly be worth the trouble.

I completely agree. It will be interesting to see how these improvements work for your waveform package. =) Maybe you should create some benchmarks for it (if there aren't already).

I do have some basic benchmarks for reading audio and computing the root mean square of audio.F64Samples. These files are 5 seconds in duration. It's interesting how much faster FLAC is than WAV.

[zsh|matt@matt-2]:~/go/waveform 0 (master) ± go test -run=NONE -bench=Compute
PASS
BenchmarkComputeValuesWAV         10     222610476 ns/op
BenchmarkComputeValuesFLAC        20      80487883 ns/op
ok      github.com/mdlayher/waveform    4.146s

I don't have any sample reading loop-specific benchmarks yet, but it might be worth looking into. I'd be glad to try out any improvements that are made!

I've just made a commit that improves the decoder across the board by 16-22%, by removing binary.Size calls from tight loops (not using unsafe). More to come.

Type Time Spent Before After
F32 -20% 0.518081917s 0.414876430s
F64 -22% 0.547400233s 0.431367131s
PCM8 -20% 0.491020468s 0.390492699s
PCM16 -19% 0.504408065s 0.409407364s
PCM24 -18% 0.657118984s 0.539338196s
PCM32 -19% 0.516224602s 0.416405221s
ALaw -16% 0.533577493s 0.445839555s
MuLaw -20% 0.488511090s 0.392282710s

Here is yet another round without any unsafe things. This time we get a 9-11% performance improvement by simply moving variable declarations outside of tight loops in the decoder.

Type Time Spent Before After
F32 -9% 0.414876430s 0.377842782s
F64 -9% 0.431367131s 0.392514910s
PCM8 -8% 0.390492699s 0.359224923s
PCM16 -9% 0.409407364s 0.374009324s
PCM24 -8% 0.539338196s 0.495933463s
PCM32 -11% 0.416405221s 0.372523136s
ALaw -8% 0.445839555s 0.409146299s
MuLaw -8% 0.392282710s 0.360234622s

Great news: Roughly 60% speed improvement for PCM8/16/32 and ALaw/MuLaw decoding!

encoding/binary actually has fast-paths implemented for all standard signed and unsigned integer types. Sadly not for float32 or float64 types though (which I can't add due to Go's version 1 compatibility guarantee as it would change an exposed interface type in that package).

Type Time Spent Before After
F32 -2% 0.440716364s 0.432334441s
F64 -4% 0.474706062s 0.453572434s
PCM8 -64% 0.420314742s 0.153145169s
PCM16 -62% 0.427091070s 0.162676099s
PCM24 -0% 0.571109580s 0.571630756s
PCM32 -62% 0.439910450s 0.166665237s
ALaw -57% 0.477538930s 0.204175348s
MuLaw -62% 0.417263679s 0.157148155s

I am now happy with all decoding performance except F32, F64, PCM24, and potentially ALaw. I'll see what else I can make of those where most by removing binary.Read dependency on them and doing the binary read by our selves (I think we can do it without unsafe too, but not sure yet).

Funny enough, we can use binary.Read into a uint32 and uint64 respectively (which has a fast-path), and then use math.Float32frombits and math.Float64frombits to get the data back out (which is written internally using unsafe). This gives us a 64-65% improvement in decoding F32/F64 wav data.

Type Time Spent Before After
F32 -64% 0.487533093s 0.175575936s
F64 -65% 0.525656683s 0.181460060s

Here's a quick benchcmp testing my waveform package using audio/wav.v1 vs. audio-wav master.

[zsh|matt@nerr-2]:~/go/waveform 0 *(master) ± benchcmp old.txt new.txt 
benchmark                      old ns/op     new ns/op     delta
BenchmarkComputeValuesWAV      90622153      42283885      -53.34%
BenchmarkComputeValuesFLAC     33116305      32799771      -0.96%

benchmark                      old allocs     new allocs     delta
BenchmarkComputeValuesWAV      553779         220539         -60.18%
BenchmarkComputeValuesFLAC     771            771            +0.00%

benchmark                      old bytes     new bytes     delta
BenchmarkComputeValuesWAV      24262212      4246494       -82.50%
BenchmarkComputeValuesFLAC     2507576       2507576       +0.00%

As you can see, WAV performance has increased quite a bit. Nice work!

Impressive improvements!

@mdlayher that is awesome! It's nice to see it in real-world code =)

I still want to investigate a few other safe performance enhancements -> binary.Read is still the main performance problem.

After I am done with safe performance enhancements I will create an unsafe branch, we can compare the benchmarks and the size of the code to determine if it is worth it.

Moving this issue to azul3d-legacy/audio-wav#8