openfl/lime

Thread-related crashes when triggering HaxeObject callbacks from Java code in a Lime/OpenFL Extension

bazzisoft opened this issue · 15 comments

While upgrading some OpenFL extension libraries to work with the latest OpenFL We've run into an old issue again:

https://github.com/jgranick/openfl-native/issues/145#issuecomment-59254331

The Extension.callbackHandler.post() method appears to run on the Android UI thread and not the Haxe thread. Thereby using this to trigger HaxeObject callbacks which then access OpenFL objects like the stage cause an immediate thread-related crash.

The old fix was to trigger the HaxeObject callbacks via GLSurfaceView.queueEvent(), however looking through the latest source it seems GLSurfaceView is no longer being used and so that function is no longer available.

Is there a new way to have HaxeObject callbacks run on the Haxe (SDLMain?) thread instead of the Android UI thread?

Or alternatively, is there a safe way to cause the desired thread switch in Haxe code? (We tried having the HaxeObject callback start a Timer and triggering UI changes off that, but the crash remains...)

Update to the above: using a haxe.Timer doesn't prevent the crash, but going through the EventDispatcher system with openfl.Timer does seem to jump it onto the right thread. Doesn't seem quite right, but by using only local variables up to the point the EventDispatcher system is triggered we may be bypassing the issue.

Still would be nice to have a Handler to switch to the Haxe thread for Java > Haxe callbacks though I think?

I believe the purpose of Extension.callbackHandler is for making calls to JNI from the Java thread, so if the callbacks are executing on the wrong thread, lets see how we can fix the handler to work properly.

Does it make a difference if it is static, or how do you think it needs to change?

Looking through the Java source code templates in lime I think there are 2 cases that need to be covered:

  1. JNI calls made from Haxe into Java initially run on the Haxe thread, and usually need to switch to the Android main/UI thread for most native calls to work.

  2. JNI callbacks made from Java back into Haxe via HaxeObject would usually run on the Android main/UI thread (for example if triggered from Activity.onActivityResult()). And in these cases it needs to switch back to the Haxe thread to avoid crashes when manipulating OpenFL objects.

The Extension.callbackHandler appears to be created in GameActivity.onCreate() (see GameActivity.java:99). So I believe it would bind to the Android main/UI thread and provide a solution for (1) above. I think I've seen some examples around recommending the use of Extension.callbackHandler for this case, and changing it to another thread would probably break this.

So my current thinking is that we need another Handler available to extensions, but this time bound to the Haxe thread to solve (2). Looking at the code I think it would have to be created in SDLActivity.java:1031 inside the SDLMain runnable. And then assigned back to the SDLActivity so it can be made available to the Extension class.

I'm not sure how we could implement this "correctly" though given that SDLMain probably shouldn't be tweaking the internals of SDLActivity.

Is there any solution for this yet? I am currently using haxe.Timer.delay call when hanlding callback from openfl extension to update openfl GUI. But not sure if it works.

Having looked into this, a Handler isn't the solution. Handler relies on Looper, which is Android's version of an event loop. This is a problem because you can't reasonably have more than one event loop per thread, and we already have MainLoop.

The solution, therefore, is to use the loop we have. You can do this with Timer.delay(), and it does seem to be thread-safe, but it also allocates a bunch of extraneous objects. A more efficient solution is MainLoop.runInMainThread(), which does exactly what it says.

Or not. MainLoop doesn't seem to process those events while Lime is running. Maybe it's waiting for Lime to return?

This is exactly why you don't try to run two event loops on the same thread. Each tries to loop forever, while expecting the other to return at regular intervals.

So... MainLoop isn't our main loop. Ok, good to know. I'll go dig through Lime's code and see what we can use instead.

Actually, I know lime haxe.Timer is not thread safe, but I modified it to be thread safe with Mutex. Just need to modify 2 files:

  1. haxe\lib\lime\7,9,0\src\haxe\Timer.hx
  2. haxe\lib\lime\7,9,0\src\lime_internal\backend\native\NativeApplication.

Also, for extensionkit, modify ExtensionKit.hx, method CreateAndDispatchEvent to use haxe.Timer

Still, this is just my solution for hxcpp. Just FYI.

Right, I was looking at the wrong version of Timer. The version we use on threaded targets is also the version that isn't thread-safe. Yay!

I'm currently sitting on a pull request to make MainLoop compatible with Lime; once I submit it, we can use MainLoop.runInMainThread().

Pull request submitted!

Sorry to ask this here, I dont know where to ask.

So, I have an android app using openfl. That app has a thread. The thread simply does null access.
I built it as debug and having the following flag inside project.xml

<define name="openfl-enable-handle-error" if="debug" /> <haxedef name="HXCPP_CHECK_POINTER" if="debug" /> <!--makes null references cause errors--> <haxedef name="HXCPP_STACK_TRACE" if="debug" /> <haxedef name="HXCPP_STACK_LINE" if="debug" /> <haxedef name="HXCPP_DEBUG_LINK" if="debug" />
When, crash, I get crash log from device and use ndk-stack to trace. It did not trace exactly to the null access line, instead, to thread wrapper call like below

********** Crash dump: **********
Build fingerprint: 'samsung/m11qnnxx/m11q:11/RP1A.200720.012/M115FXXU3BVD1:user/release-keys'
pid: 6534, tid: 6815, name: SDLThread >>> net.ent.contactmanager <<<
signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
Stack frame 05-31 15:55:47.545 6820 6820 F DEBUG : #00 pc 00065668 /apex/com.android.runtime/lib/bionic/libc.so (abort+172) (BuildId: 13bc715234d0861084dc092396cf9938)
Stack frame 05-31 15:55:47.546 6820 6820 F DEBUG : #1 pc 0203df3b /data/app/~~NjrXS18t4Ysa7VLsTF0gIg==/net.ent.contactmanager-Oial7NoNCEBS9suVVI9BIQ==/lib/arm/libApplicationMain.so (__gnu_cxx::__verbose_terminate_handler()+230): Routine __gnu_cxx::__verbose_terminate_handler() at /usr/local/google/buildbot/src/android/ndk-r15-release/toolchain/gcc/gcc-4.9/libstdc++-v3/libsupc++/vterminate.cc:95
Stack frame 05-31 15:55:47.547 6820 6820 F DEBUG : #2 pc 02012ee1 /data/app/~~NjrXS18t4Ysa7VLsTF0gIg==/net.ent.contactmanager-Oial7NoNCEBS9suVVI9BIQ==/lib/arm/libApplicationMain.so (__cxxabiv1::__terminate(void ()())+4): Routine __cxxabiv1::__terminate(void ()()) at /usr/local/google/buildbot/src/android/ndk-r15-release/toolchain/gcc/gcc-4.9/libstdc++-v3/libsupc++/eh_terminate.cc:47
Stack frame 05-31 15:55:47.547 6820 6820 F DEBUG : #3 pc 02012fe9 /data/app/~~NjrXS18t4Ysa7VLsTF0gIg==/net.ent.contactmanager-Oial7NoNCEBS9suVVI9BIQ==/lib/arm/libApplicationMain.so (std::terminate()+8): Routine std::terminate() at /usr/local/google/buildbot/src/android/ndk-r15-release/toolchain/gcc/gcc-4.9/libstdc++-v3/libsupc++/eh_terminate.cc:57 (discriminator 1)
Stack frame 05-31 15:55:47.547 6820 6820 F DEBUG : #4 pc 0201317b /data/app/~~NjrXS18t4Ysa7VLsTF0gIg==/net.ent.contactmanager-Oial7NoNCEBS9suVVI9BIQ==/lib/arm/libApplicationMain.so (__cxa_throw+110): Routine __cxa_throw at /usr/local/google/buildbot/src/android/ndk-r15-release/toolchain/gcc/gcc-4.9/libstdc++-v3/libsupc++/eh_throw.cc:87
Stack frame 05-31 15:55:47.547 6820 6820 F DEBUG : #5 pc 01e162f8 /data/app/~~NjrXS18t4Ysa7VLsTF0gIg==/net.ent.contactmanager-Oial7NoNCEBS9suVVI9BIQ==/lib/arm/libApplicationMain.so: Routine hx::Throw(Dynamic) at C:/HaxeToolkit/haxe/lib/hxcpp/4,2,1/src/hx/StdLibs.cpp:66 (discriminator 4)
Stack frame 05-31 15:55:47.547 6820 6820 F DEBUG : #6 pc 019b8e64 /data/app/~~NjrXS18t4Ysa7VLsTF0gIg==/net.ent.contactmanager-Oial7NoNCEBS9suVVI9BIQ==/lib/arm/libApplicationMain.so: Routine _hx_run at D:\git\ent\haxe\ContactManager\bin\android\obj/./src/util/_ENTThread/HaxeThread.cpp:129 (discriminator 2)
Stack frame 05-31 15:55:47.547 6820 6820 F DEBUG : #7 pc 019b8f24 /data/app/~~NjrXS18t4Ysa7VLsTF0gIg==/net.ent.contactmanager-Oial7NoNCEBS9suVVI9BIQ==/lib/arm/libApplicationMain.so: Routine __run at D:\git\ent\haxe\ContactManager\bin\android\obj/./src/util/_ENTThread/HaxeThread.cpp:139
Stack frame 05-31 15:55:47.547 6820 6820 F DEBUG : #8 pc 01df2880 /data/app/~~NjrXS18t4Ysa7VLsTF0gIg==/net.ent.contactmanager-Oial7NoNCEBS9suVVI9BIQ==/lib/arm/libApplicationMain.so: Routine hxThreadFunc(void*) at C:/HaxeToolkit/haxe/lib/hxcpp/4,2,1/src/hx/Thread.cpp:267 (discriminator 2)
Stack frame 05-31 15:55:47.547 6820 6820 F DEBUG : #9 pc 000b0567 /apex/com.android.runtime/lib/bionic/libc.so (__pthread_start(void*)+40) (BuildId: 13bc715234d0861084dc092396cf9938)
Stack frame 05-31 15:55:47.547 6820 6820 F DEBUG : #10 pc 00066b37 /apex/com.android.runtime/lib/bionic/libc.so (__start_thread+30) (BuildId: 13bc715234d0861084dc092396cf9938)

dont payattention to ENTThread name. It is actually this
https://github.com/HaxeFoundation/haxe/blob/development/std/cpp/_std/sys/thread/Thread.hx

I copied the content over because I use old haxe (4.0.5) while I need to use eventloop feature of latest haxe Thread.

So, as you can see, it trace right to line 130 in the Thread.hx file, but not my null access line.

So, not sure what I need to do so that ndk-stack does trace to exact cpp code line that cause program crash. My question may be demonstrated like the following question

https://community.haxe.org/t/cpp-target-crashes-without-any-error-messages/2785/4

Check in /data/tombstones for more detailed crash dumps.

$ adb shell ls /data/tombstones
tombstone_01 tombstone_02 tombstone_03 tombstone_04
tombstone_05 tombstone_06
$ adb pull /data/tombstones/tombstone_06 tombstone_06
$ ndk-stack -i tombstone_06 -sym .

Yes. I did use tombstone. But still, no info related to the crash line in the thread_func. So, not sure what I am missing. @@

Then I think it's time to break out lldb. Recall that Export/android/bin is a fully-functional Android project that Android Studio can open, so in theory the instructions should work as written.

Ok. Thank you.

Oh wait, I just went and re-read your use case. All this time I thought you were debugging a specific error that only happened on Android, rather than trying to figure out how to approach the problem for future reference.

For future reference, I suggest debugging the cpp or hl target if you can reproduce the crash. Compilation is faster, and you have more options for debuggers. VS Code even has extensions to attach a debugger to a running program, though I've never gotten those to work personally.