WAV decoder performance
slimsag opened this issue · 13 comments
I've added a benchmark to the audio/wav
decoder (590de9cb01f017e31c5f798028fef1920dcf36a2) and using it pprof shows that 76% of the time is spent during binary.Read
operations (it uses reflect, so most time is spent allocating slices with runtime.makeslice
).
I think this issue can be solved by buffering the data into a slice, and then performing unsafe conversions where possible.
@mewmew I've been thinking about the best way to go forward with the audio package and friends. It got me thinking that we need audio codecs independent of their file (container formats):
azul3d.org/audio/codec/pcm.v1
- could handle fast translation of binary data toPCM8
/PCM16
/PCM24
/PCM32
/F32
/F64
/MuLaw
/ALaw
) and back. The.wav
file decoder (azul3d.org/audio/wav.v1
) would utilize this package.azul3d.org/audio/codec/flac.v1
- could handle decoding and encoding of FLAC streams (they aren't tied to OGG files). Then the.flac
decoder (azul3d.org/audio/flac.v1
) would utilize this package.
Based on my (very rough) benchmarking for mdlayher/waveform
, any improvement to audio decoding would be hugely helpful. It is definitely a pretty lengthy process, and if optimizations can be made (removing binary.Read
) at the expense of a few more lines of code, it would almost certainly be worth the trouble.
It's great that you are adding benchmarks. I feel a bit uneasy about using unsafe. If we can't find any other way to find similar performance improvements without using unsafe, and the improvements are substantial I guess it cannot be helped. At the very least we should make sure not to do too much magic with unsafe; basically only using it to avoid duplicate allocations.
azul3d.org/audio/codec/flac.v1
- could handle decoding and encoding of FLAC streams (they aren't tied to OGG files). Then the.flac
decoder (azul3d.org/audio/flac.v1
) would utilize this package.
The audio sample decoding is already performed by the frame package, metadata decoding is performed by the meta, and the .flac
file container format is decoded by the flac package.
It would be trivial to implement a front-end (azul3d.org/audio/codec/flac.v1
) for the audio sample decoding, using the frame package as a backend.
I've added more detailed benchmarks (azul3d-legacy/audio-wav@471a994), and improved the tests for the wav decoder. The benchmarks below show the time for decoding an entire 11s wav file:
BenchmarkDecodeFloat32 30 533053090 ns/op
BenchmarkDecodeFloat64 30 538004120 ns/op
BenchmarkDecodeUint8 20 570293048 ns/op
BenchmarkDecodeInt16 20 563579666 ns/op
BenchmarkDecodeInt24 30 527503099 ns/op
BenchmarkDecodeInt32 30 524333476 ns/op
BenchmarkDecodeALaw 20 564546919 ns/op
BenchmarkDecodeMuLaw 20 574923546 ns/op
Maybe we will want to use shorter (one second?) wav files for the benchmarks since they must be measured in ns/op. FWIW 500000000ns
== 0.5s
, so we are able to decode an entire 11s wav file right now in roughly 0.5s
total (this is on a rather old Pentium dual core laptop).
I feel a bit uneasy about using unsafe. If we can't find any other way to find similar performance improvements without using unsafe, and the improvements are substantial I guess it cannot be helped. At the very least we should make sure not to do too much magic with unsafe; basically only using it to avoid duplicate allocations.
Right. I feel this way too. I like clean, pure, and idiomatic Go code so I think we should only use unsafe if it:
- Gives us a significant performance boost.
- Doesn't require too much unsafe code.
- Can be well documented and easily understood.
- Most important: Is impossible do with plain Go code.
I will investigate further.
The audio sample decoding is already performed by the frame package, metadata decoding is performed by the meta, and the .flac file container format is decoded by the flac package.
It would be trivial to implement a front-end (azul3d.org/audio/codec/flac.v1) for the audio sample decoding, using the frame package as a backend.
Yeah, I had this exact idea in mind. I think the whole idea of azul3d.org/audio/codec
needs more thought though. I think we shouldn't do it unless someone would use it (i.e. if someone was implementing a video decoder or something and wanted to use the flac decoder).
Based on my (very rough) benchmarking for mdlayher/waveform, any improvement to audio decoding would be hugely helpful. It is definitely a pretty lengthy process, and if optimizations can be made (removing binary.Read) at the expense of a few more lines of code, it would almost certainly be worth the trouble.
I completely agree. It will be interesting to see how these improvements work for your waveform package. =) Maybe you should create some benchmarks for it (if there aren't already).
I do have some basic benchmarks for reading audio and computing the root mean square of audio.F64Samples
. These files are 5 seconds in duration. It's interesting how much faster FLAC is than WAV.
[zsh|matt@matt-2]:~/go/waveform 0 (master) ± go test -run=NONE -bench=Compute
PASS
BenchmarkComputeValuesWAV 10 222610476 ns/op
BenchmarkComputeValuesFLAC 20 80487883 ns/op
ok github.com/mdlayher/waveform 4.146s
I don't have any sample reading loop-specific benchmarks yet, but it might be worth looking into. I'd be glad to try out any improvements that are made!
I've just made a commit that improves the decoder across the board by 16-22%, by removing binary.Size
calls from tight loops (not using unsafe
). More to come.
Type | Time Spent | Before | After |
---|---|---|---|
F32 | -20% | 0.518081917s | 0.414876430s |
F64 | -22% | 0.547400233s | 0.431367131s |
PCM8 | -20% | 0.491020468s | 0.390492699s |
PCM16 | -19% | 0.504408065s | 0.409407364s |
PCM24 | -18% | 0.657118984s | 0.539338196s |
PCM32 | -19% | 0.516224602s | 0.416405221s |
ALaw | -16% | 0.533577493s | 0.445839555s |
MuLaw | -20% | 0.488511090s | 0.392282710s |
Here is yet another round without any unsafe
things. This time we get a 9-11% performance improvement by simply moving variable declarations outside of tight loops in the decoder.
Type | Time Spent | Before | After |
---|---|---|---|
F32 | -9% | 0.414876430s | 0.377842782s |
F64 | -9% | 0.431367131s | 0.392514910s |
PCM8 | -8% | 0.390492699s | 0.359224923s |
PCM16 | -9% | 0.409407364s | 0.374009324s |
PCM24 | -8% | 0.539338196s | 0.495933463s |
PCM32 | -11% | 0.416405221s | 0.372523136s |
ALaw | -8% | 0.445839555s | 0.409146299s |
MuLaw | -8% | 0.392282710s | 0.360234622s |
Great news: Roughly 60% speed improvement for PCM8/16/32 and ALaw/MuLaw decoding!
encoding/binary
actually has fast-paths implemented for all standard signed and unsigned integer types. Sadly not for float32
or float64
types though (which I can't add due to Go's version 1 compatibility guarantee as it would change an exposed interface type in that package).
Type | Time Spent | Before | After |
---|---|---|---|
F32 | -2% | 0.440716364s | 0.432334441s |
F64 | -4% | 0.474706062s | 0.453572434s |
PCM8 | -64% | 0.420314742s | 0.153145169s |
PCM16 | -62% | 0.427091070s | 0.162676099s |
PCM24 | -0% | 0.571109580s | 0.571630756s |
PCM32 | -62% | 0.439910450s | 0.166665237s |
ALaw | -57% | 0.477538930s | 0.204175348s |
MuLaw | -62% | 0.417263679s | 0.157148155s |
I am now happy with all decoding performance except F32
, F64
, PCM24
, and potentially ALaw
. I'll see what else I can make of those where most by removing binary.Read
dependency on them and doing the binary read by our selves (I think we can do it without unsafe too, but not sure yet).
Funny enough, we can use binary.Read
into a uint32
and uint64
respectively (which has a fast-path), and then use math.Float32frombits
and math.Float64frombits
to get the data back out (which is written internally using unsafe
). This gives us a 64-65% improvement in decoding F32/F64 wav data.
Type | Time Spent | Before | After |
---|---|---|---|
F32 | -64% | 0.487533093s | 0.175575936s |
F64 | -65% | 0.525656683s | 0.181460060s |
Here's a quick benchcmp
testing my waveform
package using audio/wav.v1
vs. audio-wav
master.
[zsh|matt@nerr-2]:~/go/waveform 0 *(master) ± benchcmp old.txt new.txt
benchmark old ns/op new ns/op delta
BenchmarkComputeValuesWAV 90622153 42283885 -53.34%
BenchmarkComputeValuesFLAC 33116305 32799771 -0.96%
benchmark old allocs new allocs delta
BenchmarkComputeValuesWAV 553779 220539 -60.18%
BenchmarkComputeValuesFLAC 771 771 +0.00%
benchmark old bytes new bytes delta
BenchmarkComputeValuesWAV 24262212 4246494 -82.50%
BenchmarkComputeValuesFLAC 2507576 2507576 +0.00%
As you can see, WAV performance has increased quite a bit. Nice work!
Impressive improvements!
@mdlayher that is awesome! It's nice to see it in real-world code =)
I still want to investigate a few other safe performance enhancements -> binary.Read
is still the main performance problem.
After I am done with safe performance enhancements I will create an unsafe branch, we can compare the benchmarks and the size of the code to determine if it is worth it.
Moving this issue to azul3d-legacy/audio-wav#8