barsoosayque/libgdx-oboe

JNI DETECTED ERROR IN APPLICATION: use of deleted global reference

la-matthew-yang opened this issue ยท 21 comments

I have been struggling with this issue for a while.

I have two questions:

  1. What could potentially cause "use of deleted global reference" in libgdx-oboe?

  2. Is there any way to show function name and line number for libgdx-oboe in backtrace?
    For all the other so files, there are function name and lines numbers. But for libgdx-oboe (from #12 to #16), there are no such info. Is there a config that I can set to show such info?

Thanks a lot.

10-23 16:22:20.547 32591 32619 F libc    : Fatal signal 6 (SIGABRT), code -6 in tid 32619 (Thread-1623)
10-23 16:22:20.611   324   324 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
10-23 16:22:20.611   324   324 F DEBUG   : Build fingerprint: 'asus/CN_P024/P024_3:6.0.1/MMB29M/CN_P024-V5.7.3-20161011:user/release-keys'
10-23 16:22:20.611   324   324 F DEBUG   : Revision: '0'
10-23 16:22:20.611   324   324 F DEBUG   : ABI: 'arm'
10-23 16:22:20.612   324   324 F DEBUG   : pid: 32591, tid: 32619, name: Thread-1623  >>> com.xuexue.ai.english.android <<<
10-23 16:22:20.612   324   324 F DEBUG   : signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
10-23 16:22:20.659   324   324 F DEBUG   : Abort message: 'art/runtime/java_vm_ext.cc:410] JNI DETECTED ERROR IN APPLICATION: use of deleted global reference 0x2005aa'
10-23 16:22:20.660   324   324 F DEBUG   :     r0 00000000  r1 00007f6b  r2 00000006  r3 98d63978
10-23 16:22:20.660   324   324 F DEBUG   :     r4 98d63980  r5 98d63930  r6 0000000b  r7 0000010c
10-23 16:22:20.660   324   324 F DEBUG   :     r8 b8870700  r9 b4c37378  sl 00000000  fp 00000000
10-23 16:22:20.660   324   324 F DEBUG   :     ip 00000006  sp 98d63198  lr b6d6bf25  pc b6d6e320  cpsr 40070010
10-23 16:22:20.685   324   324 F DEBUG   : 
10-23 16:22:20.685   324   324 F DEBUG   : backtrace:
10-23 16:22:20.685   324   324 F DEBUG   :     #00 pc 00044320  /system/lib/libc.so (tgkill+12)
10-23 16:22:20.685   324   324 F DEBUG   :     #01 pc 00041f21  /system/lib/libc.so (pthread_kill+32)
10-23 16:22:20.685   324   324 F DEBUG   :     #02 pc 0001ba4f  /system/lib/libc.so (raise+10)
10-23 16:22:20.685   324   324 F DEBUG   :     #03 pc 00018bf1  /system/lib/libc.so (__libc_android_abort+34)
10-23 16:22:20.685   324   324 F DEBUG   :     #04 pc 000167b0  /system/lib/libc.so (abort+4)
10-23 16:22:20.685   324   324 F DEBUG   :     #05 pc 0032140d  /system/lib/libart.so (_ZN3art7Runtime5AbortEv+212)
10-23 16:22:20.685   324   324 F DEBUG   :     #06 pc 000f3cdd  /system/lib/libart.so (_ZN3art10LogMessageD2Ev+2212)
10-23 16:22:20.686   324   324 F DEBUG   :     #07 pc 0024f86d  /system/lib/libart.so (_ZN3art9JavaVMExt8JniAbortEPKcS2_+1524)
10-23 16:22:20.686   324   324 F DEBUG   :     #08 pc 0024fc79  /system/lib/libart.so (_ZN3art9JavaVMExt9JniAbortFEPKcS2_z+68)
10-23 16:22:20.686   324   324 F DEBUG   :     #09 pc 0033e3a7  /system/lib/libart.so (_ZNK3art6Thread13DecodeJObjectEP8_jobject+686)
10-23 16:22:20.686   324   324 F DEBUG   :     #10 pc 000fff25  /system/lib/libart.so (_ZN3art11ScopedCheck5CheckERNS_18ScopedObjectAccessEbPKcPNS_12JniValueTypeE.constprop.95+900)
10-23 16:22:20.686   324   324 F DEBUG   :     #11 pc 00107837  /system/lib/libart.so (_ZN3art8CheckJNI14GetObjectClassEP7_JNIEnvP8_jobject+370)
10-23 16:22:20.686   324   324 F DEBUG   :     #12 pc 00046b33  /data/app/com.test.android-1/lib/arm/liblibgdx-oboe.so
10-23 16:22:20.686   324   324 F DEBUG   :     #13 pc 00045ab7  /data/app/com.test.android-1/lib/arm/liblibgdx-oboe.so
10-23 16:22:20.686   324   324 F DEBUG   :     #14 pc 0004421b  /data/app/com.test.android-1/lib/arm/liblibgdx-oboe.so
10-23 16:22:20.686   324   324 F DEBUG   :     #15 pc 0004409f  /data/app/com.test.android-1/lib/arm/liblibgdx-oboe.so
10-23 16:22:20.686   324   324 F DEBUG   :     #16 pc 0005111d  /data/app/com.test.android-1/lib/arm/liblibgdx-oboe.so
10-23 16:22:20.686   324   324 F DEBUG   :     #17 pc 000085d5  /system/lib/libwilhelm.so
10-23 16:22:20.686   324   324 F DEBUG   :     #18 pc 000609c1  /system/lib/libmedia.so (_ZN7android10AudioTrack18processAudioBufferEv+1232)
10-23 16:22:20.686   324   324 F DEBUG   :     #19 pc 00061631  /system/lib/libmedia.so (_ZN7android10AudioTrack16AudioTrackThread10threadLoopEv+172)
10-23 16:22:20.686   324   324 F DEBUG   :     #20 pc 0001011d  /system/lib/libutils.so (_ZN7android6Thread11_threadLoopEPv+112)
10-23 16:22:20.686   324   324 F DEBUG   :     #21 pc 00041823  /system/lib/libc.so (_ZL15__pthread_startPv+30)
10-23 16:22:20.686   324   324 F DEBUG   :     #22 pc 00019285  /system/lib/libc.so (__start_thread+6)
10-23 16:22:22.683  1196  2349 E NativeCrashListener: Exception dealing with report
10-23 16:22:22.683  1196  2349 E NativeCrashListener: android.system.ErrnoException: read failed: EAGAIN (Try again)
10-23 16:22:22.683  1196  2349 E NativeCrashListener: 	at libcore.io.Posix.readBytes(Native Method)
10-23 16:22:22.683  1196  2349 E NativeCrashListener: 	at libcore.io.Posix.read(Posix.java:169)
10-23 16:22:22.683  1196  2349 E NativeCrashListener: 	at libcore.io.BlockGuardOs.read(BlockGuardOs.java:230)
10-23 16:22:22.683  1196  2349 E NativeCrashListener: 	at android.system.Os.read(Os.java:367)
10-23 16:22:22.683  1196  2349 E NativeCrashListener: 	at com.android.server.am.NativeCrashListener.consumeNativeCrashData(NativeCrashListener.java:240)
10-23 16:22:22.683  1196  2349 E NativeCrashListener: 	at com.android.server.am.NativeCrashListener.run(NativeCrashListener.java:138)
  1. It's actually very odd. Either it's a multi-threading problem that I overlooked (related to get/set variables of kotlin classes from the native part of the code), or music end callback from setOnCompletionListener.
  2. All the libraries you have in your stacktrace are built with debug symbols in them (probably, you're running a debug build). But libgdx-oboe is distributed in the release version, without such symbols. The solution is to clone the repo, and declare a local dependency instead of the remote one. In that case, you'll have to setup NDK and all other requirements for native andriod development.

So are you using threads other than main in the app ? If not, are you using setOnCompletionListener method ?

I think I have finally figured out what is causing this issue. :)

Essentially, it is because completion listener is notified after OboeMusic.dispose() is already called. When this happens, calling jvm_class(context->GetObjectClass(new_callback)) causes JNI ERROR: use of deleted global reference.

I am not sure how this is possible. Because OboeMusic.stop() is always called before OboeMusic.dispose(). Maybe OboeMusic.stop() has no effect when the music is very close to completion?

I tried to solve this issue by calling OboeMusic.setOnCompletionListener(null) before OboeMusic.dispose(). But it seems that OboeMusic.setOnCompletionListener(null) is not supported.

Oh. It might actually be possible because of OboeMusic.setOnCompletionListener(null) since it doesn't seems to reset native music callback. Shouldn't be possible after OboeMusic.dispose(), because it just deletes the native music object (which actually calls the callback), though I'm not sure if this is some thread shenanigans, in that case everything might be possible.

I sadly can't test any of this, because my pc died (and my current can't run the heavy android toolchain). ๐Ÿ˜ž

How often does it crashes ? Are crashes regular or you can't reliably replicate them ? Does it crashes if no OboeMusic.setOnCompletionListener(null) is called ?

I think you are right about the thread shenanigans. Based on the logs, it seems that AudioTrack::AudioTrackThread::threadLoop() triggers completion listener right after another thread calls OboeMusic.dispose().

Do you think adding a listener check in on_complete could fix the issue?

OBOEMUSIC_METHOD(void, setOnCompletionListener) (JNIEnv* env, jobject self, jobject callback) {
    auto context = jni_context::acquire_thread();

    if (auto old_callback = get_var_as<_jobject>(env, self, "listener")) {
        context->DeleteGlobalRef(old_callback);
    }

    if (auto instance = get_var_as<std::shared_ptr<music>>(env, self, "music")) {
        if (callback) {
            auto new_callback = context->NewGlobalRef(callback);
            auto self_weak = context->NewWeakGlobalRef(self);
            set_var_as(env, self, "listener", new_callback);

            (*instance)->on_complete([self_weak, new_callback] {
                if (auto new_callback = get_var_as<_jobject>(env, self, "listener")) {
                    auto context = jni_context::acquire_thread();
                    auto callback_class = jvm_class(context->GetObjectClass(new_callback));
                    callback_class.execute_method<void(Music)>(new_callback, "onCompletion", Music{self_weak});
                }
            });
        }
    }
}

Off the top of my head, I think something like that might work:

OBOEMUSIC_METHOD(void, setOnCompletionListener) (JNIEnv* env, jobject self, jobject callback) {
    auto context = jni_context::acquire_thread();

    if (auto old_callback = get_var_as<_jobject>(env, self, "listener")) {
        context->DeleteGlobalRef(old_callback);
        old_callback = nullptr;
        set_var_as<_jobject>(env, self, "listener", nullptr);
    }

    if (auto instance = get_var_as<std::shared_ptr<music>>(env, self, "music")) {
        if (callback) {
            auto new_callback = context->NewGlobalRef(callback);
            auto self_weak = context->NewWeakGlobalRef(self);
            set_var_as(env, self, "listener", new_callback);

            (*instance)->on_complete([self_weak, new_callback] {
                if (auto new_callback = get_var_as<_jobject>(env, self, "listener")) {
                    auto context = jni_context::acquire_thread();
                    auto callback_class = jvm_class(context->GetObjectClass(new_callback));
                    callback_class.execute_method<void(Music)>(new_callback, "onCompletion", Music{self_weak});
                }
            });
        } else {
            (*instance)->on_complete(nullptr);
        }
    }
}

and

OBOEMUSIC_METHOD(void, dispose) (JNIEnv* env, jobject self) {
    if (auto instance = get_var_as<std::shared_ptr<music>>(env, self, "music")) {
        (*instance)->on_complete(nullptr);
    }
    if (auto callback = get_var_as<_jobject>(env, self, "listener")) {
        auto context = jni_context::acquire_thread();
        context->DeleteGlobalRef(callback);
        callback = nullptr;
        set_var_as<_jobject>(env, self, "listener", nullptr);
    }
    delete_var<std::shared_ptr<music>>(env, self, "music");
}

And if that doesn't work, then some spinlocks are required sadly

I think this looks good. I really want to give this a try. But I don't have much experience with NDT or ffmpeg, so I am not sure how long it will take for me to have everything setup. Does the build process work for Mac OS?

Does the build process work for Mac OS?

Yeah, I assume so, yes. You actually don't need anything other than Android NDK and regular android setup. ffmpeg library is pre-built, and normally shouldn't be rebuilt

Thanks! I will give it a try today.

I have the project setup. I am trying to debug the problem. Do you know how to generate the debug symbols so that I can use ndk-stack?

I got ndk-stack working. :) You code is well organized.

After some investigation, I think I have found the root cause of the issue.

It is actually not caused by native completion listener. If I remove the usage of the native completion listener completely, the issue still occurs. Although based on my limited testing, usage of native completion listener increases the chance of the issue occurring.

I think the issue is caused by concurrent modification of audio resources. If music::dispose and music::stop are never called, the issue doesn't occur at all.

The issue probably occurs when a music is being used in mixer::render by the system audio thread, while it is disposed or stopped by a game thread at exactly the time.

The resulting native error has many different stacktraces. Most of them have the message "fault addr". Some have "ConcurrentGC" in their stack trace.

Sometimes, the issue doesn't cause a crash. Instead, it causes Oboe system to stop working. Afterwards, Gdx.audio.newMusic() hands and causes app to get stuck.

I tried putting thread lock in different places, but couldn't get it to work.

The options that I can think of for fixing this issue are:

  1. Mark music as disposed or stopped in game thread and perform the actual dispose and stop in mixer::render to avoid concurrent usage
  2. Use one oboe stream with SharingMode::Shared for each music. This way, music is only modified by its own audio thread. The sound mixing is done by the Android system.

The second option is great. Don't know why I didn't use shared oboe streams in the first place ๐Ÿคท
That may still require a bunch of atomic_flag tests, though.

I should get my computer fixed soon, I'll look into it. Although, refactoring will likely take some time.

Big thanks for suggestions and tinkering with the code ๐Ÿ‘

@barsoosayque Do you have a preferred way of communication like WhatsApp, Skype, Email, etc? I would like to talk to you about some potential collaboration opportunities, if you don't mind.

Sure. Either telegram or email.

Great! I've sent you an email.

I have refactored callback logic in a0ad49b. And per my tests in 612edb2, I can't get any concurent crashes. I want to believe that the issue is somehow fixed, but pretty much open to suggestions on how to test it further.

After some thoughts, I think it's preferable to leave oboe stream as exclusive, since it offers the lowest latency.
https://github.com/google/oboe/blob/83f79e93e32dee9e2b5d3ced07dc94c82d4bc182/include/oboe/Definitions.h#L193-L213

Well, I would like to believe that this issue is fixed, so I will close it. If the crash still occurs, please reopen.