lostromb/concentus.oggfile

Android performance

Closed this issue · 6 comments

Thank you for this library. I have a cross platform project that currently uses it with great success on Xamarin for Windows, iOS and MacOS, but I'm having trouble getting the level of performance I'm used to on these 3 platforms to be the same on Xamarin Android.

I'm testing it on a Motorola Moto E (4) with Android 7.1 API 25.

My Concentus versions are 1.1.7 and 1.0.3 for OggFile.

This is my encoding method:

public static byte[] OpusEncode(short[] samples)
        {
            using (MemoryStream ms = new MemoryStream())
            {
                var encoder = new OpusEncoder(sampleRate, channelCount, OpusApplication.OPUS_APPLICATION_VOIP);
                OpusOggWriteStream oggOut = new OpusOggWriteStream(encoder, ms);
                oggOut.WriteSamples(samples, 0, samples.Length);
                oggOut.Finish();
                return ms.ToArray();
            }
        }

I'm working with a sample rate of 48000 and one channel. The buffer is also 48000. The input encoding is Android.Media.Encoding.Pcm16bit.

On Windows, MacOS and iOS I'm used to seeing this function return in approximately 100-200 milliseconds. Whereas on Android I'm seeing it return in approximately 2000 milliseconds, which is way too slow for this app.

I can switch it to OPUS_APPLICATION_RESTRICTED_LOWDELAY and get about a 4x improvement, but not the 10x I'm looking for, and the quality suffers.

Changing the sample rate to 16000 doesn't seem to make any difference which surprises me.

The audio plays back fine once it has been encoded.

I know that android has some built-in support for Opus for decoding, but as far as I can see not for encoding.

I know the readme on this repo says that wrapping native libopus should speed things up by about 50% but since I'm seeing a degradation which is much more significant it leaves me to wonder if I'm missing something in my setup. Any advice would be appreciated.

I'm looking at doing my own build of libopus and using the p-invoke signatures at https://github.com/jasl8r/P-Opus but so far we've gotten away without having to fuss with the NDK.

I have a minimal reproduction .sln if anyone is interested which I can attach.

Hmmmm. I'm not an expert in Xamarin or Android but I can give a few general pointers to try and see where the problem is coming from:

  • Does it still take the same amount of time if you isolate the encoding from the audio capture? Try embedding or generating a static waveform and then running the encode loop to see how well it performs on that
  • Does the performance change after JITting, when the encoder loop has run a few times?
  • Does anything change if you cache the instance of OpusEncoder statically and just call Reset() on it rather than reinstantiating a new encoder each time?
  • Does it change if you target a different framework version e.g. NetCore? From my own testing I have noticed that simply compiling the library with .Net Core 2.0 gives a substantial performance improvement. Though maybe that is caused by platform-specific optimizations and more pronounced on x64 than on ARM. I don't know if it will make any difference either but you could try integrating the Concentus source directly into your project and compiling it into the same module

Thanks for your thoughts. Tried a few things.

Tried compiling in the project. Didn't help. Seemed promising but I think it's pretty much the same as pulling in the nuget.

It's probably not surprising that most of the execution time is on the call to OpusOggWriteStream.WriteSamples. Constructing and re-using the encoder is instant and seems to not matter. But tried calling Reset on a single instance of OpusEncoder too.

I don't think .net core is an option on android. Looks like it would be more trouble than it is worth to get going. See what this requires for example: https://github.com/qmfrederik/coredroid.

I call Encode roughly once per second and don't see a difference between calls. The encode performance is roughly constant.

Here's a good high level description about how Xamarin works with the android runtime.

https://docs.microsoft.com/en-us/xamarin/cross-platform/app-fundamentals/building-cross-platform-applications/understanding-the-xamarin-mobile-platform

C# is compiled to IL and packaged with MonoVM + JIT’ing. Unused classes in the framework are stripped out during linking. The application runs side-by-side with Java/ART (Android runtime) and interacts with the native types via JNI (see Xamarin.Android Limitations)

More details here: https://docs.microsoft.com/en-us/xamarin/android/internals/architecture

It seems like sometimes the mono runtime keeps to itself, or interacts directly with the linux kernel (especially for IO), and sometimes it goes out to Java droid land and back again.

I guess if the parts of mono that are getting called are actually running side-by-side with Java and calling it through JNI--that sounds... well like a bit of a bottleneck. It could be "slower" than doing things inside the Java runtime in other words.

It could be that some part of the Concentus code, when translated to Xamarin android, is ending up in this "bridged state" when it could be taking a more direct path to the encoding result.

I'm attaching my minimal repro.

ConcentusTest.zip

In that performance measurement can you get any deeper than OpusOggWriteStream.WriteSamples()?

I ask because the bottleneck is going fall somewhere in 3 general parts:

  1. A routine reads audio from the AudioRecord device. This almost certainly makes use of the JNI / Java Android bridge which the article alludes to.
  2. The encoder runs. This consists of running the frame encoder and ogg muxer, but at its heart the entire process is just primitive arithmetic, loops, and arrays, with no other system dependencies whatsoever.
  3. The ogg output gets piped to some file or network output, which might potentially use native Android APIs again.

Your minimal repro doesn't do #3 so we can rule it out. I believe we can also rule out #1 based on the timestamping metrics used in your code (though a Stopwatch would be much more accurate for profiling, just FYI). So the only possible conclusion I can draw is that the Mono runtime, for whatever reason, is just really crap at optimizing the encoder algorithms. It doesn't seem reasonable that it would be 10x slower than other runtimes in this particular case, but maybe Opus just makes extreme use of some feature that isn't optimized very well like cached heap access or array/loop bounds check elimination. I really don't know. The Xamarin article also mentions that Android is the only platform that can't do ahead-of-time compilation so it makes sense that this is the one slow platform out of all the ones you've tested.

From here it seems you have 3 options:

  1. If the project allows it, you can compile with /unsafe and define the UNSAFE macro, which will enable some experimental optimizations which may override the "crap" ones that the Mono JIT seems to be generating
  2. You can use the native Opus C library and P/Opus, which will introduce native compilation but also offer some decent benefits such as ARM NEON optimizations for even better performance
  3. You can move to some other runtime that isn't Mono, which for the time being seems like it will be difficult

One other random thought: does Android have some kind of aggressive CPU scaling watchdog that tries to keep battery usage low when it thinks apps aren't working too hard? Maybe because you're running this encode on a background task and its rate is limited by real-time input from the microphone, the OS thinks "this app isn't working too hard, I can take it easy for a while". Just a shot in the dark really

Thanks again for your ideas.

Tried the idea of compiling with unsafe and the UNSAFE macro. No luck. I think my best bet is to try to p/invoke libopus. I can't bail on mono unfortunately since this project shares tons of code with the other projects already built.

I'd like to profile it on a lower level than OpusOggWriteStream.WriteSamples. If I do I'll update this ticket with what I find.

I just wanted to close this out with some new information that hopefully will help someone. This issue has everything to do with how Xamarin performs on Android and little to do with this excellent library.

I'm able to get close to the performance I need when I compile in Release mode, make some tweaks to sample rate, and use the low latency Opus profile. The difference in encode times between Debug and Release surprised me. I get at least 4x improvement in encode times when I build in Release mode. That "seems" a bigger jump than I'm used to, but then again I don't often profile release builds.

In my browsing I'm gleaning that people want AOT and LLVM more on Android... Sadly right now AOT an LLVM are "enterprise" and "experimental" Xamarin Droid options. I haven't bothered to test how much of a bump I'd get. I'll probably end up there though.

Lesson learned: for x-platform .net code, how you compile matters a lot more than you might expect depending on your target platform...

It still troubles me how blazing fast encoding is when I compile even in Debug on Windows. Windows would appear to be at least 20 to 30 times faster, but it may simply be that I have 2 more processors on my windows VM than I do on the Android device, which could end up making all the difference in the world for this code, especially due to do the way Droid dedicates a blocking thread to receiving Mic input....