bitfireAT/ical4android

CI tests fail at testLargeTransactionManyRows

Closed this issue · 8 comments

See https://github.com/bitfireAT/ical4android/runs/6677616922

> Task :packageDebugAndroidTest
> Task :createDebugAndroidTestApkListingFileRedirect
[EmulatorConsole]: Failed to start Emulator console for 5554

> Task :connectedDebugAndroidTest
additionalTestOutput is not supported on this device running API level 28 because the additional test output directory could not be found
Starting 400 tests on emulator-5554 - 9

at.bitfire.ical4android.AndroidEventTest > testLargeTransactionManyRows[emulator-5554 - 9] FAILED 

Tests on emulator-5554 - 9 failed: There was 1 failure(s).

Test results saved as file:/__w/ical4android/ical4android/build/outputs/androidTest-results/connected/test-result.pb. Inspect these results in Android Studio by selecting Run > Import Tests From File from the menu bar and importing test-result.pb.

Since the commit on 26 May 2022: https://github.com/bitfireAT/ical4android/commits/main

Maybe this is not even related to the changes of the commit.

I am having a hard time reproducing this locally in android studio. Could I get some steps to reproduce? :)

Hm you can take the steps from the Github test workflow. It bases on an empty Android 9 emulator:
https://github.com/bitfireAT/docker-android-ci

And then the console commands from the workflow are run

I finally managed to reproduce failing of the test once or twice (running all emulator tests of ical4android on a clean Android 9 AVD). It was not reliably failing though. Sometimes other tests failed and sometimes no tests ran. Once I had them all pass as well. Running the single test testLargeTransactionManyRows did not fail at all. Neither did the test fail if all tests in the AndroidEventTest class were run together. It looked like the failing cases were caused by the emulator disconnecting at random. The logs, were not helpful.

Furthermore, after trying with a younger commit, all the tests passed and then continued to pass, even after switching back to the latest problematic commit mentioned (clean build and clearing caches). Trying with clean new AVDs did nothing either, all tests just continued to pass.

I think the issue is not related to the commit and I am unable to think of other steps to try that might reproduce the problem again.
Is this happening consistently on the emulator?

Okay, after reproducing locally we finally got some logs.

2022-06-14 13:49:31.910 3594-3618/at.bitfire.ical4android.test I/TestRunner: finished: testPopulateEvent_Status_Confirmed(at.bitfire.ical4android.AndroidEventTest)
2022-06-14 13:49:31.912 3594-3618/at.bitfire.ical4android.test I/TestRunner: started: testLargeTransactionManyRows(at.bitfire.ical4android.AndroidEventTest)
2022-06-14 13:49:31.915 3594-3618/at.bitfire.ical4android.test I/GrantPermissionCallable: Permission: android.permission.WRITE_CALENDAR is already granted!
2022-06-14 13:49:31.915 3594-3618/at.bitfire.ical4android.test I/GrantPermissionCallable: Permission: android.permission.READ_CALENDAR is already granted!
2022-06-14 13:49:31.918 3594-3618/at.bitfire.ical4android.test I/ical4android: Creating local calendar: name=TestCalendar allowedReminders=0,1,2 account_type=LOCAL allowedAttendeeTypes=0,2,1,3 calendar_displayName=ical4android Test Calendar account_name=ical4android@example.com allowedAvailability=0,1
2022-06-14 13:49:31.918 3347-3628/com.android.providers.calendar W/ContentResolver: Failed to get type for: content://com.android.calendar (Unknown URL content://com.android.calendar)
2022-06-14 13:49:31.930 3347-3358/com.android.providers.calendar I/viders.calenda: Background concurrent copying GC freed 57837(2MB) AllocSpace objects, 0(0B) LOS objects, 50% free, 2MB/5MB, paused 126us total 110.194ms
2022-06-14 13:49:32.506 3594-3605/at.bitfire.ical4android.test I/al4android.tes: Background concurrent copying GC freed 148213(6MB) AllocSpace objects, 0(0B) LOS objects, 20% free, 23MB/29MB, paused 118us total 419.471ms
2022-06-14 13:49:32.841 3594-3605/at.bitfire.ical4android.test I/al4android.tes: Background concurrent copying GC freed 340616(23MB) AllocSpace objects, 0(0B) LOS objects, 26% free, 16MB/22MB, paused 123us total 198.634ms
2022-06-14 13:49:33.140 3594-3618/at.bitfire.ical4android.test E/JavaBinder: !!! FAILED BINDER TRANSACTION !!!  (parcel size = 2448968)
2022-06-14 13:49:33.140 3594-3618/at.bitfire.ical4android.test W/ical4android: Transaction too large, splitting (losing atomicity)
2022-06-14 13:49:33.292 3594-3618/at.bitfire.ical4android.test E/JavaBinder: !!! FAILED BINDER TRANSACTION !!!  (parcel size = 1224356)
2022-06-14 13:49:33.292 3594-3618/at.bitfire.ical4android.test W/ical4android: Transaction too large, splitting (losing atomicity)
2022-06-14 13:49:33.485 3347-3630/com.android.providers.calendar W/ContentResolver: Failed to get type for: content://com.android.calendar (Unknown URL content://com.android.calendar)
2022-06-14 13:49:33.807 1685-1701/system_process I/system_server: Background concurrent copying GC freed 86005(3MB) AllocSpace objects, 16(576KB) LOS objects, 42% free, 7MB/13MB, paused 106us total 226.801ms
2022-06-14 13:49:33.944 3347-3358/com.android.providers.calendar I/viders.calenda: Background concurrent copying GC freed 79987(3MB) AllocSpace objects, 0(0B) LOS objects, 50% free, 5MB/10MB, paused 113us total 121.449ms
2022-06-14 13:49:34.537 3347-3358/com.android.providers.calendar I/viders.calenda: Background concurrent copying GC freed 158140(5MB) AllocSpace objects, 0(0B) LOS objects, 47% free, 6MB/12MB, paused 98us total 294.115ms
2022-06-14 13:49:34.626 1685-1701/system_process I/system_server: Background concurrent copying GC freed 76379(2MB) AllocSpace objects, 0(0B) LOS objects, 42% free, 8MB/14MB, paused 105us total 433.536ms
2022-06-14 13:49:35.094 3347-3358/com.android.providers.calendar I/viders.calenda: Background concurrent copying GC freed 188770(7MB) AllocSpace objects, 0(0B) LOS objects, 42% free, 8MB/14MB, paused 112us total 207.588ms
2022-06-14 13:49:35.311 1685-1701/system_process I/system_server: Background concurrent copying GC freed 79984(2MB) AllocSpace objects, 0(0B) LOS objects, 42% free, 8MB/14MB, paused 105us total 250.369ms
2022-06-14 13:49:35.364 3594-3618/at.bitfire.ical4android.test E/JavaBinder: !!! FAILED BINDER TRANSACTION !!!  (parcel size = 1224756)
2022-06-14 13:49:35.364 3594-3618/at.bitfire.ical4android.test W/ical4android: Transaction too large, splitting (losing atomicity)
2022-06-14 13:49:35.616 3594-3605/at.bitfire.ical4android.test I/al4android.tes: Background concurrent copying GC freed 259791(14MB) AllocSpace objects, 0(0B) LOS objects, 41% free, 8MB/14MB, paused 150us total 322.092ms
2022-06-14 13:49:35.912 3347-3358/com.android.providers.calendar I/viders.calenda: Background concurrent copying GC freed 228006(8MB) AllocSpace objects, 0(0B) LOS objects, 40% free, 8MB/14MB, paused 112us total 233.513ms
2022-06-14 13:49:36.367 1685-1701/system_process I/system_server: Background concurrent copying GC freed 87671(2MB) AllocSpace objects, 0(0B) LOS objects, 40% free, 8MB/14MB, paused 119us total 409.323ms
2022-06-14 13:49:36.554 3347-3358/com.android.providers.calendar I/viders.calenda: Background concurrent copying GC freed 243275(9MB) AllocSpace objects, 0(0B) LOS objects, 43% free, 7MB/13MB, paused 152us total 201.368ms
2022-06-14 13:49:36.689 1685-2086/system_process E/BpBinder: Too many binder proxy objects sent to uid 1000 from uid 10023 (6000 proxies held)
2022-06-14 13:49:36.711 1685-2086/system_process E/Parcel: Reading a NULL string not supported here.
2022-06-14 13:49:36.720 1642-2388/? W/: reached BBinder::getInterfaceDescriptor (this=0xf11b6160)
2022-06-14 13:49:36.729 1685-2086/system_process E/Parcel: Reading a NULL string not supported here.
2022-06-14 13:49:36.771 1685-2086/system_process I/chatty: uid=1000(system) Binder:1685_6 identical 8 lines
2022-06-14 13:49:36.777 1685-2086/system_process E/Parcel: Reading a NULL string not supported here.
2022-06-14 13:49:36.780 1685-2086/system_process V/Binder: BinderProxy descriptor histogram (top ten):
2022-06-14 13:49:36.780 1685-2086/system_process V/Binder:  #1: android.database.IContentObserver x6045
2022-06-14 13:49:36.780 1685-2086/system_process V/Binder:  #2: android.content.IIntentReceiver x101
2022-06-14 13:49:36.780 1685-2086/system_process V/Binder:  #3: android.content.IContentProvider x29
2022-06-14 13:49:36.780 1685-2086/system_process V/Binder:  #4:  x17
2022-06-14 13:49:36.780 1685-2086/system_process V/Binder:  #5: android.app.IApplicationThread x17
2022-06-14 13:49:36.780 1685-2086/system_process V/Binder:  #6: android.hardware.display.IDisplayManagerCallback x17
2022-06-14 13:49:36.780 1685-2086/system_process V/Binder:  #7: com.android.internal.telephony.IOnSubscriptionsChangedListener x11
2022-06-14 13:49:36.780 1685-2086/system_process V/Binder:  #8: android.os.IMessenger x7
2022-06-14 13:49:36.780 1685-2086/system_process V/Binder:  #9: android.view.IWindow x6
2022-06-14 13:49:36.780 1685-2086/system_process V/Binder:  #10: android.app.IServiceConnection x5
2022-06-14 13:49:36.782 1685-1758/system_process E/ActivityManager: Uid 10023 sent too many Binders to uid 1000
2022-06-14 13:49:36.782 1685-1758/system_process I/ActivityManager: Killing 3347:com.android.providers.calendar/u0a23 (adj 0): Too many Binders sent to SYSTEM
2022-06-14 13:49:36.782 1685-1758/system_process I/ActivityManager: Killing 3594:at.bitfire.ical4android.test/u0a67 (adj 0): depends on provider com.android.providers.calendar/.CalendarProvider2 in dying proc com.android.providers.calendar (adj 0)
2022-06-14 13:49:36.783 1685-1770/system_process W/libprocessgroup: kill(-3347, 9) failed: No such process
2022-06-14 13:49:36.808 1685-3341/system_process W/WindowManager: removeWindowToken: Attempted to remove non-existing token: android.os.Binder@b360861
2022-06-14 13:49:36.808 1685-3341/system_process W/BinderNative: Uncaught exception from death notification
    java.lang.NullPointerException: Attempt to read from field 'android.accessibilityservice.IAccessibilityServiceClient com.android.server.accessibility.AbstractAccessibilityServiceConnection.mServiceInterface' on a null object reference
        at com.android.server.accessibility.UiAutomationManager.destroyUiAutomationService(UiAutomationManager.java:179)
        at com.android.server.accessibility.UiAutomationManager.access$200(UiAutomationManager.java:41)
        at com.android.server.accessibility.UiAutomationManager$UiAutomationService.binderDied(UiAutomationManager.java:230)
        at android.os.BinderProxy.sendDeathNotice(Binder.java:1193)
2022-06-14 13:49:36.808 1685-3412/system_process W/ActivityManager: Crash of app at.bitfire.ical4android.test running instrumentation ComponentInfo{at.bitfire.ical4android.test/androidx.test.runner.AndroidJUnitRunner}
2022-06-14 13:49:36.808 1685-3412/system_process I/ActivityManager: Force stopping at.bitfire.ical4android.test appid=10067 user=0: finished inst
2022-06-14 13:49:36.809 1685-1685/system_process V/SettingsProvider: Notifying for 0: content://settings/secure/accessibility_enabled
2022-06-14 13:49:36.811 3585-3593/? W/Binder: Binder call failed.
    java.lang.SecurityException: Calling from not trusted UID!
        at android.app.UiAutomationConnection.throwIfCalledByNotTrustedUidLocked(UiAutomationConnection.java:468)
        at android.app.UiAutomationConnection.shutdown(UiAutomationConnection.java:372)
        at android.app.IUiAutomationConnection$Stub.onTransact(IUiAutomationConnection.java:222)
        at android.os.Binder.execTransact(Binder.java:731)
2022-06-14 13:49:36.812 1643-1643/? I/Zygote: Process 3594 exited due to signal (9)
2022-06-14 13:49:36.814 1685-3635/system_process W/Binder: Outgoing transactions from this process must be FLAG_ONEWAY
    java.lang.Throwable
        at android.os.BinderProxy.transact(Binder.java:1114)
        at android.app.IInstrumentationWatcher$Stub$Proxy.instrumentationFinished(IInstrumentationWatcher.java:164)
        at com.android.server.am.InstrumentationReporter$MyThread.run(InstrumentationReporter.java:86)
2022-06-14 13:49:36.814 3585-3585/? D/AndroidRuntime: Shutting down VM
2022-06-14 13:49:36.827 1685-1770/system_process W/libprocessgroup: kill(-3347, 9) failed: No such process
2022-06-14 13:49:36.831 1643-1643/? I/Zygote: Process 3347 exited due to signal (9)
2022-06-14 13:49:36.867 1685-1770/system_process W/libprocessgroup: kill(-3347, 9) failed: No such process
2022-06-14 13:49:36.867 1685-1770/system_process I/libprocessgroup: Successfully killed process cgroup uid 10023 pid 3347 in 84ms
2022-06-14 13:49:36.867 1685-1770/system_process W/libprocessgroup: kill(-3594, 9) failed: No such process
2022-06-14 13:49:36.867 1685-1770/system_process I/libprocessgroup: Successfully killed process cgroup uid 10067 pid 3594 in 0ms
2022-06-14 13:49:38.417 3636-3636/? W/viders.calenda: Unexpected CPU variant for X86 using defaults: x86
2022-06-14 13:49:38.421 1685-1769/system_process I/ActivityManager: Start proc 3636:com.android.providers.calendar/u0a23 for content provider com.android.providers.calendar/.CalendarProvider2
2022-06-14 13:49:38.430 3636-3636/? I/viders.calenda: The ClassLoaderContext is a special shared library.
2022-06-14 13:49:38.436 3636-3636/? I/CalendarProvider2: Created com.android.providers.calendar.CalendarAlarmManager@f230017(com.android.providers.calendar.CalendarProvider2@4b83c04)
2022-06-14 13:49:42.828 1655-1753/? E/storaged: getDiskStats failed with result NOT_SUPPORTED and size 0
2022-06-14 13:49:42.929 1685-2086/system_process E/memtrack: Couldn't load memtrack module
2022-06-14 13:49:42.929 1685-2086/system_process W/android.os.Debug: failed to get memory consumption info: -1
2022-06-14 13:49:43.462 3636-3650/com.android.providers.calendar I/viders.calenda: Waiting for a blocking GC ProfileSaver

For future investigation/reference:

It is most certainly a Android 9 (api28) related issue, where the limit of android binders possible to send to the system is exceeded due to missing ram or some other quirk in our docker-android-ci image or emulator configuration. Some test (most often testLargeTransactionManyRows) fails as soon to said limit is hit. Using only 3000 instead of 4000 attendees in the test will still result in one transaction split (which we desire to test), but may still have the test fail occasionally... I also observed a scenario where 4000 attendees worked just fine, just to fail again on a second run.

10/10 consecutive runs with 10.000 attendees worked out like the below scenario shown in the screenshot. I was unable to test Android 11, 12 or 13 (api levels 30-33) as they were missing in the sdkmanager/avdmanager for x86.

docker emulators+one studio emulator (Pixel):

image

only studio emulators:

image

--

We could investigate further to fix the Android 9 on docker emulator related problem or just use a different Android version for our CI tests. For the future, maybe we could even have multiple android versions tested on the server, so as to catch issues like this one right away?

--

EDIT: Android 10 on docker fails serverside too, not locally though ...

Not the exact same logcat messages, but worth looking through to see when android kills our app because of too many binders
https://cs.android.com/android/platform/superproject/+/master:frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java;l=7798

I wonder whether it would help to

  • move the BatchOperation test to a new test class (should be done anyway) and then
  • force GC before / after every test in the BatchOperationTest class (maybe even with some sleep).

Maybe forced garbage collecting closes the binder references so that they don't become too many?

  • force GC before / after every test in the BatchOperationTest class (maybe even with some sleep).

Maybe forced garbage collecting closes the binder references so that they don't become too many?

GC does indeed help, if we were to use it inside the BatchOperation class ie when the transaction split occurs., but it would be hacky and unreliable, because we can not be certain gc will actually happen.

Running the test with 4000 attendees by itself causes the the Binder limit to be hit and the test to fail, so requesting GC before/after does not help unfortunately. :/

For now I will leave two tests:

  • one with only 2000 attendees only, which I expect to succeed in any case.
  • one with 4000 attendees, which I will mark as flaky and then also exempt flaky tests from running in the CI