rust-mobile/android-activity

Process is killed when closing before cleanup

Zoxc opened this issue · 8 comments

Zoxc commented

This even happens during the onSaveInstanceState callback. I wonder if the main thread terminates and takes the rest of the threads with it somehow.

rib commented

it could be good if you can point at a simple modification of an example that reproduces this reliably if possible.

It sounds like you're saying this will reproduce with something like a blocking thread::sleep(Duration::from_secs(1.0)) while handling MainEvent::SaveState in agdk-mainloop ?

How are you closing the app to test this?

I guess you're sure your app isn't panicing in the save state callback?

at least I don't think Android guarantees that onDestroy will be invoked and may just terminate the process and rely on the OS to clean up resources. It would only need to go through the full onDestroy life cycle if it was expecting to re-use the process.

the save state callbacks should be reliable though, and should be synchronized with java too to ensure that the system is going to wait for anything the application needs to do to save state - so it certainly sounds like a bug if you're seeing the process terminate before the app callback returns.

Zoxc commented

I added a sleep call to onSaveInstanceState_native (C++) with logging before and after, then closed my app by swiping up in the app switcher.

rib commented

right, I also just reproduced with this patch:

$ git diff
diff --git a/examples/agdk-mainloop/src/lib.rs b/examples/agdk-mainloop/src/lib.rs
index c45f186..d9c2bdf 100644
--- a/examples/agdk-mainloop/src/lib.rs
+++ b/examples/agdk-mainloop/src/lib.rs
@@ -26,7 +26,10 @@ fn android_main(app: AndroidApp) {
                         info!("Main event: {:?}", main_event);
                         match main_event {
                             MainEvent::SaveState { saver, .. } => {
+                                info!("Started Save State");
+                                std::thread::sleep(std::time::Duration::from_secs(5));
                                 saver.store("foo://bar".as_bytes());
+                                info!("Finished Save State");
                             }
                             MainEvent::Pause => {}
                             MainEvent::Resume { loader, .. } => {

and when swiping the app away in the task switcher I saw it log that it started to save state but didn't finish.

I don't know if 5 seconds is unreasonable or not here - I haven't see any documentation for a timeout but I suppose there might be some kind of limit where the system won't wait that long? How long did you try delaying for?
Did your logging show that the C++ code was correctly waiting on the condition variable in:

    while (!android_app->stateSaved) {
        pthread_cond_wait(&android_app->cond, &android_app->mutex);
    }

that's the main thing I'd be looking to double check - since that'll be what determines when the Java callback will return.

rib commented

ok, so with this patch to the C++ code:

diff --git a/android-activity/game-activity-csrc/game-activity/native_app_glue/android_native_app_glue.c b/android-activity/game-activity-csrc/game-activity/native_app_glue/android_native_app_glue.c
index 5fd3eb8..7293935 100644
--- a/android-activity/game-activity-csrc/game-activity/native_app_glue/android_native_app_glue.c
+++ b/android-activity/game-activity-csrc/game-activity/native_app_glue/android_native_app_glue.c
@@ -368,15 +368,17 @@ static void onResume(GameActivity* activity) {
 static void onSaveInstanceState(GameActivity* activity,
                                 SaveInstanceStateRecallback recallback,
                                 void* context) {
-    LOGV("SaveInstanceState: %p", activity);
+    LOGW("Start SaveInstanceState: %p", activity);

     struct android_app* android_app = ToApp(activity);
     pthread_mutex_lock(&android_app->mutex);
     android_app->stateSaved = 0;
     android_app_write_cmd(android_app, APP_CMD_SAVE_STATE);
+    LOGW("Waiting in SaveInstanceState: %p", activity);
     while (!android_app->stateSaved) {
         pthread_cond_wait(&android_app->cond, &android_app->mutex);
     }
+    LOGW("Finished waiting in SaveInstanceState: %p", activity);

     if (android_app->savedState != NULL) {
         // Tell the Java side about our state.

I then get a logcat like:

2022-08-25 22:32:50.521 5126-5126/co.realfit.agdkmainloop D/GameActivity: onSaveInstanceState_native
2022-08-25 22:32:50.521 5126-5126/co.realfit.agdkmainloop W/threaded_app: Start SaveInstanceState: 0x728902e580
2022-08-25 22:32:50.522 5126-5126/co.realfit.agdkmainloop W/threaded_app: Waiting in SaveInstanceState: 0x728902e580
2022-08-25 22:32:50.522 5126-5201/co.realfit.agdkmainloop I/main: Main event: SaveState { saver: StateSaver { app: AndroidAppInner { native_app: NativeAppGlue { ptr: 0x7273347f80 }, config: Configuration { mcc: 234, mnc: Some(10), lang: Some("en"), country: Some("GB"), orientation: Port, touchscreen: Finger, density: Some(420), keyboard: NoKeys, navigation: NoNav, keys_hidden: Soft, nav_hidden: Any, sdk_version: 28, screen_size: Normal, screen_long: Yes, ui_mode_type: Normal, ui_mode_night: No }, native_window: RwLock { data: None, poisoned: false, .. } } } }
2022-08-25 22:32:50.522 5126-5201/co.realfit.agdkmainloop I/main: Started Save State

So it looks like the C++ is waiting as it should be.

hmmm, not sure what to think here - this suggests Android is terminating the app even though the save state Java callback hasn't returned - so that sounds like a general Android thing, not really something android-activity is doing wrong itself.

Maybe there is some kind of time limit imposed on the save state callback to account for apps that lock up. Certainly 5 seconds is a pretty long delay for testing here and I can't imagine many apps taking even one second to save their state.

rib commented

Out of curiosity I went to check the Java code for GameActivity which looks like:

@Override
  protected void onSaveInstanceState(Bundle outState) {
    super.onSaveInstanceState(outState);
    byte[] state = onSaveInstanceStateNative(mNativeHandle);
    if (state != null) {
      outState.putByteArray(KEY_NATIVE_SAVED_STATE, state);
    }
  }

and I was initially wondering if they should only be chaining up at the end (in case calling into the parent class somehow notifies the system that the app has finished saving state), but the docs don't seem to imply that's the case. also out of curiosity I saw that the implementation for NativeActivity.java is identical.

This is a typic ANR case. We should not doing long time operation on main thread.

rib commented

Yeah it sounds very plausible/likely. It could perhaps be good to confirm this, while I don't recall seeing an ANR error in logcat when testing this.

Apparently the default ANR timeout is 5 seconds so maybe one way of being more sure that it's ANR that interrupting the save state here could be to see if there are still interruptions for e.g. 4 seconds.

rib commented

Lets assume that the testing was at issue here and the application was being terminated by ANR.

At the time we investigated it did indeed look like the Java synchronization primitives were working as intended, so whatever the reason for the app being terminated early seems likely to be a standard mechanism (such as ANR), not a bug with android-activity.