optimizely/android-sdk

com.optimizely.ab.android.datafile_handler.DatafileService - ANR

akhilmohan opened this issue · 6 comments

Hi,

We found an issue in our production app. There are multiple occurrences of ANR being reported.

SDK version: 2.1.0

Below is the stack trace of the crash,

`
executing service com.myapp.prod/com.optimizely.ab.android.datafile_handler.DatafileService

com.optimizely.ab.android.datafile_handler.DatafileService
group="main" sCount=1 dsCount=0 obj=0x761a2298 self=0xb47a4500

sysTid=19837 nice=0 cgrp=apps sched=0/0 handle=0xb6f4eb4c
state=S schedstat=( 2908414587 1901277872 7208 ) utm=151 stm=139 core=3 HZ=100
stack=0xbe0d6000-0xbe0d8000 stackSize=8MB
held mutexes=at java.lang.Object.wait! (Native method)waiting on <0x070ae6c0> (a java.lang.Object)at java.lang.Thread.parkFor$ (Thread.java:1220)locked <0x070ae6c0> (a java.lang.Object)at sun.misc.Unsafe.park (Unsafe.java:299)at java.util.concurrent.locks.LockSupport.park (LockSupport.java:158)at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt (AbstractQueuedSynchronizer.java:810)at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly (AbstractQueuedSynchronizer.java:970)at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly (AbstractQueuedSynchronizer.java:1278)at java.util.concurrent.CountDownLatch.await (CountDownLatch.java:203)at android.app.SharedPreferencesImpl$EditorImpl$1.run (SharedPreferencesImpl.java:366)at android.app.QueuedWork.waitToFinish (QueuedWork.java:88)at android.app.ActivityThread.handleServiceArgs (ActivityThread.java:4068)at android.app.ActivityThread.access$2400 (ActivityThread.java:221)at android.app.ActivityThread$H.handleMessage (ActivityThread.java:1897)at android.os.Handler.dispatchMessage (Handler.java:102)at android.os.Looper.loop (Looper.java:158)at android.app.ActivityThread.main (ActivityThread.java:7225)at java.lang.reflect.Method.invoke! (Native method)at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run (ZygoteInit.java:1230)at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:1120)"FinalizerWatchdogDaemon" tid=6 Sleeping "FinalizerWatchdogDaemon" daemon prio=5 tid=6 Sleeping
group="system" sCount=1 dsCount=0 obj=0x32c0d220 self=0xad918800
sysTid=19848 nice=0 cgrp=apps sched=0/0 handle=0xb341d930
state=S schedstat=( 1293228 15132450 9 ) utm=0 stm=0 core=0 HZ=100
stack=0xb331b000-0xb331d000 stackSize=1038KB
held mutexes=at java.lang.Thread.sleep! (Native method)sleeping on <0x0eee4d9f> (a java.lang.Object)at java.lang.Thread.sleep (Thread.java:1031)locked <0x0eee4d9f> (a java.lang.Object)at java.lang.Thread.sleep (Thread.java:985)at java.lang.Daemons$FinalizerWatchdogDaemon.sleepFor (Daemons.java:273)at java.lang.Daemons$FinalizerWatchdogDaemon.waitForFinalization (Daemons.java:284)at java.lang.Daemons$FinalizerWatchdogDaemon.run (Daemons.java:232)at java.lang.Thread.run (Thread.java:818)
`

Can someone shed some light on this ANR log?

@akhilmohan , thank you for reporting this. A ANR is not a crash. The app is not responding. In this case, if there is a problem with your data connection and datafile polling is enabled, it could be that the data file service was waiting to long for the cdn response. Right now, our timeout is 5 seconds. Can you tell us anything else about the issue such as what other threads might be doing at the time or the state of your app at the time? Thank you again.

I would also recommend updating to the latest version of the SDK.

We also have bunch of ANR's reported. Looks all are android 6.0 version.
SDK version: 2.1.2

Play console shows me this warning:

The "main" (tid=1) thread is in suspended state. This is likely due to garbage collection.

@pragyagrawal can you update to the latest SDK? We are on 3.1 now and releasing 3.2 soon.

@thomaszurkan-optimizely I'm having issues with a different looking ANR log for DatafileService. I've run it across Android 8, 9, and 10 with the same result. The app stalls out on an ANR when the app boots up while the phone is in Airplane mode.

To reproduce:

  1. Put phone into Airplane mode
  2. Run app that has Optimizely Android SDK 3.2.1
2019-09-13 14:44:51.142 19476-19503/? E/ActivityManager: ANR in com.example.internal
    PID: 20062
    Reason: executing service com.example.internal/com.optimizely.ab.android.datafile_handler.DatafileService
    Load: 1.24 / 0.93 / 0.72
    CPU usage from 110256ms to 0ms ago (2019-09-13 14:42:57.855 to 2019-09-13 14:44:48.111):
      11% 1814/surfaceflinger: 0.5% user + 11% kernel / faults: 115 minor 1 major
      5.6% 19476/system_server: 1.1% user + 4.4% kernel / faults: 18591 minor 63 major
      5.3% 19604/com.android.systemui: 1.2% user + 4.1% kernel / faults: 4420 minor 47 major
      1.6% 20000/com.google.android.gms.persistent: 0.6% user + 0.9% kernel / faults: 5039 minor 101 major
      1.6% 19896/com.google.android.apps.nexuslauncher: 0.4% user + 1.1% kernel / faults: 8057 minor 53 major
      1.6% 1799/android.hardware.graphics.composer@2.1-service: 0% user + 1.5% kernel / faults: 13 minor 2 major
      1.3% 20350/com.google.android.googlequicksearchbox:search: 0.3% user + 0.9% kernel / faults: 4408 minor 323 major
      1% 1802/android.hardware.sensors@1.0-service: 0% user + 1% kernel
      0.8% 3259/adbd: 0% user + 0.7% kernel / faults: 102 minor 1 major
      0.6% 19323/com.google.android.gm: 0.2% user + 0.4% kernel / faults: 3975 minor 7 major
      0.6% 19409/audioserver: 0% user + 0.6% kernel / faults: 3 minor 1 major
      0.6% 19414/android.hardware.audio@2.0-service: 0% user + 0.6% kernel
      0.5% 20167/com.google.android.gms: 0.1% user + 0.3% kernel / faults: 2870 minor 56 major
      0.3% 1678/logd: 0% user + 0.3% kernel / faults: 1 minor
      0.2% 19711/com.android.phone: 0.2% user + 0% kernel / faults: 474 minor 16 major
      0.1% 559/kswapd0: 0% user + 0.1% kernel
      0.1% 17722/kworker/u8:2: 0% user + 0.1% kernel
      0.1% 1855/statsd: 0% user + 0% kernel / faults: 21 minor
      0.1% 1812/lmkd: 0% user + 0.1% kernel
      0.1% 1821/logcat: 0% user + 0.1% kernel
      0.1% 19753/kworker/u8:5: 0% user + 0.1% kernel
      0.1% 1797/android.hardware.graphics.allocator@2.0-service: 0% user + 0.1% kernel / faults: 17 minor
      0.1% 19169/kworker/u8:0: 0% user + 0.1% kernel
      0% 19412/netd: 0% user + 0% kernel / faults: 691 minor
      0% 1701/jbd2/vdc-8: 0% user + 0% kernel
      0% 8/rcu_preempt: 0% user + 0% kernel
      0% 2006/dhcpclient: 0% user + 0% kernel
      0% 19314/kworker/u8:4: 0% user + 0% kernel
      0% 1783/android.system.suspend@1.0-service: 0% user + 0% kernel
      0% 1808/ashmemd: 0% user + 0% kernel
      0% 12548/android.process.acore: 0% user + 0% kernel / faults: 377 minor
      0% 15947/com.android.providers.calendar: 0% user + 0% kernel / faults: 794 minor 1 major
      0% 1796/android.hardware.gnss@1.0-service: 0% user + 0% kernel
      0% 15939/com.google.android.calendar: 0% user + 0% kernel / faults: 115 minor 39 major
      0% 18572/logcat: 0% user + 0% kernel
      0% 7/ksoftirqd/0: 0% user + 0% kernel
      0% 866/kworker/0:1H: 0% user + 0% kernel
      0% 1679/servicemanager: 0% user + 0% kernel
      0% 22529/com.android.chrome: 0% user + 0% kernel / faults: 24 minor 4 major
      0% 16413/kworker/0:0: 0% user + 0% kernel
      0% 18159/kworker/u8:1: 0% user + 0% kernel
      0% 19407/zygote64: 0% user + 0% kernel / faults: 192 minor 1 major
      0% 19676/com.google.android.networkstack: 0% user + 0% kernel / faults: 213 minor 5 major
      0% 13647/com.android.carrierdefaultapp: 0% user + 0% kernel / faults: 13 minor
      0% 19544/kworker/0:2: 0% user + 0% kernel
      0% 1/init: 0% user + 0% kernel / faults: 15 minor 1 major
      0% 9/rcu_sched: 0% user + 0% kernel
      0% 19397/kworker/u9:0: 0% user + 0% kernel
      0% 19866/com.google.android.inputmethod.latin: 0% user + 0% kernel / faults: 38 minor 9 major
      0% 20189/com.google.process.gservices: 0% user + 0% kernel / faults: 39 minor
      0% 15/watchdog/1: 0% user + 0% kernel
      0% 21/watchdog/2: 0% user + 0% kernel
      0% 1320/kworker/2:1H: 0% user + 0% kernel
      0% 1324/kworker/1:1H: 0% user + 0% kernel
      0% 1680/hwservicemanager: 0% user + 0% kernel / faults: 17 minor
      0% 1683/kworker/3:1H: 0% user + 0% kernel
      0% 1801/android.hardware.power@1.1-service.ranchu: 0% user + 0% kernel
      0% 1804/android.hardware.wifi@1.0-service: 0% user + 0% kernel / faults: 3 minor
      0% 1825/traced: 0% u

@thomaszurkan-optimizely we are experiencing these ANRs with SDK 3.2.1 in our app. I noticed that the linked fix #295 for this will only modify the JobService scheduling on Oreo and later devices, not those using AlarmManager (see ServiceScheduler.schedule())

We have only started seeing these ANRs since we enabled datafile polling recently. It appears that on devices < Oreo our app process will be restarted every datafile polling period, even if the app has been terminated by the OS. There are some rare cases where our app will ANR during the initialisation of the app, and it seems that the frequency of these ANRs has increased because the datafile service is being triggered by AlarmManager, which is starting the app in the background.

@fingertricks we believe these should be completely cleared up with the latest release 3.5.2.