DataDog/dd-sdk-flutter

No implementation found for method addError on channel datadog_sdk_flutter.rum

androidmitry opened this issue · 37 comments

Stack trace

Fatal Exception: io.flutter.plugins.firebase.crashlytics.FlutterError: MissingPluginException(No implementation found for method addError on channel datadog_sdk_flutter.rum)
at MethodChannel._invokeMethod(platform_channel.dart:332)
at ._willHandleError(helpers.dart:14)

Reproduction steps

Add the datadog_flutter_plugin package, release to App Store

Volume

0,0021 (1-2 users per day)

Affected SDK versions

2.4.0

Does the crash manifest in the latest SDK version?

Yes

Flutter Version

3.19.5

Setup Type

Flutter Application

Device Information

OS - Android
per version:
Android 12 - 88%
Android 10 - 7%
Android 14 - 3%
Android 13 - 2%

per device:
Samsung - 93%
Oneplus - 7%

Other relevant information

Device states: background 60%

Hi @androidmitry ,

Thanks for the report, I'll look into this as soon as I can.

Can you give me anymore information about a possible reproduction? Have you been able to reproduce locally at all? Is there anything strange about your setup that might be disconnecting the MethodChannel from our plugin? We tend to wrap every call we make to try to avoid crashes, so I'm very concerned that this is causing a crash....

Hi @fuzzybinary , unfortunately thats all information I have so far. I wasn't able to reproduce it. We had some custom platform code, but it was removed. Whats interesting is that number of reports is decreasing. I will update the issue if crash goes away.

@androidmitry Yeah if you can keep me posted I would appreciate it.

I'm seen issues in the past where the method channel can get disconnected from the plugin, but I've fixed those, and most threw errors in the native layer, not Dart.

This issue is also occurring in version 2.1.0, and we have encountered the MissingPluginException from the Android channels datadog_sdk_flutter.rum and datadog_sdk_flutter.logs in the production release. Due to consecutive RUM events, the error count is excessively high. Below are some error messages we've received:

  1. MissingPluginException(No implementation found for method addError on channel datadog_sdk_flutter.rum)
  2. MissingPluginException(No implementation found for method createLogger on channel datadog_sdk_flutter.logs)
  3. MissingPluginException(No implementation found for method stopView on channel datadog_sdk_flutter.rum)

Hi @nirmal0707,

I'm actively investigating this, but I haven't had much reproducing. Do you happen to have any steps to reproduce, or anything you can tell me about your app before / after you started seeing the errors?

Hi @fuzzybinary ,

This issue was not reproducible but began occurring when we migrated our codebase to Flutter 3.16.4, three months ago. Previously, we were using version 1.5.1, and the Flutter upgrade required us to move the package version to 2.1.0, resulting in this issue arising for some users in production.

Alright, thanks @nirmal0707, That may help me track down the issue.

Hi folks -- a few questions for everyone to see if I can try to diagnose this:

  • Is anyone using background tasks or foreground services, or the flutter_background_service?
  • Are you using push notifications or a push notification service like firebase_cloud_messaging? Do these errors tend to spike immediately after a push notification is sent out?
  • Is anyone using Flutter in an add to app scenario, or using attachToExisting in the SDK?
  • Does GeneratedPluginRegistrant.java enclose all the plugins in a try/catch block?
  • Do the MissingPluginException errors correlate with any other errors around the same time?

Sorry this is taking so long but I am having a really hard time reproducing, even when forcing certain error states, and. comparing with Crashlytics, we perform the registration and de-registration of our method channels the same way they do, so I'm not sure how or why they'd catch the errors and we don't.

Another question as I continue to investigate -- Does anyone have any customizations of their FlutterActivity? Overriding onCreate, configureFlutterEngine, onDestroy or any other methods?

For us crash reports started coming when we upgraded flutter from 3.16.9 to 3.19.5

Is anyone using background tasks or foreground services

We have foreground service but we don't use flutter_background_service package. Also according to breadcrumb events attached to crash it usually happens in foreground.

Are you using push notifications or a push notification service like firebase_cloud_messaging ? Do these errors tend to spike immediately after a push notification is sent out ?

Yes. No.

Is anyone using Flutter in an add to app scenario, or using attachToExisting in the SDK?

No

Does GeneratedPluginRegistrant.java enclose all the plugins in a try/catch block?

Yes

Do the MissingPluginException errors correlate with any other errors around the same time?

Checked several users and no other issues were reported around same time

Does anyone have any customizations of their FlutterActivity?

We do, I will double check them.

My error message is a bit different MissingPluginException(No implementation found for method reportLongTask on channel datadog_sdk_flutter.rum)

These are my Sentry logs:

image

Then a bunch of:

image

And then:

image

Maybe you are not handling the destroyed lifecycle correctly? Or another plugin is interfering?

Hi @feinstein, thanks for the additional information. All of the MissingPluginException issues are related, regardless of the method channel named and the method recorded, so any additional info is helpful.

The FlutterJNI error is interesting, that wouldn't be us so I'm very curious what might cause that, and curious if they're related.

We actually don't handle activity lifecycle at all, instead relying on Flutter's onAttachedToEngine and onDetachedFromEngine, which is what makes this error so frustrating, as those should be triggered properly when Flutter itself starts and stops.

Have you been able to reproduce locally at all?

AFAIK Flutter JNI is the Java interop for connecting the C++ Flutter engine to the Android app.

Maybe Flutter is not triggering the engine's life cycle correctly to your lib.

We were not able to reproduce it locally. We made some tiny changes to our FlutterActivity, I will report if it helped.

@fuzzybinary just noting that we are still experiencing the issue mentioned in #552 (which I believe is the same issue being tracked here) despite removing the native cruft I referred to in my last comment on that issue. IIRC I am able to reproduce this in our application fairly consistently. If I have a sec today I'll play around and see if I can reproduce. According to another engineer on my team we're seeing ~249k instances of this issue per week. We've had to filter these issues out of our crash reporting to avoid going beyond our contracted threshold 🙃

STR would would be ridiculously helpful. If I can reproduce I can likely get it fixed and out with the next version ASAP.

@fuzzybinary Just chimining in again on @nirmal0707 behalf, looking at our Sentry error logs, we also see a large number of lifecycle events being reported in quick succession in the error events for this:

image

And the above screenshot is only about a quarter of the pause/resume breadcrumb events in that particular Sentry error event.

Not sure if thats relevant, but perhaps this rapid set of lifecycle events causes some sort of race condition in the Datadog plugins setup code?

This looks weird, so many transitions in under 1 second.

What makes me exclude a Flutter error is that only the DD plugin is raising this exception.... but on a second thought, few packages would trigger a method channel call when the app is being destroyed

Another question from research:

Is anyone suffering from this error still using runZonedGuarded over PlatformDispatcher.instance.onError? (If you are using Datadog.runApp we do not use runZonedGuarded)

I'm looking for commonalities here, since I cannot reproduce with any example I have, but all of my examples use PlatformDispatcher.

We use runZonedGuarded, is it deprecated ? We set PlatformDispatcher.instance.onError as well

PlatformDispatcher.instance.onError is preferred and the two do essentially the same thing.

I'm going to do more research but I'm curious if the new zone creation is occasionally bypassed by backgrounding / foregrounding.

Tests on my side related to runZonedGuarded don't duplicate the issue unfortunately.

Next question -- is everyone experiencing this potentially using multiple Flutter engines or booting engines themselves for any reason? There is a potentially related Flutter issue if so. Doing a quick scan of the issue it's possible we might be able to fix this on the Datadog side, but knowing would help me focus efforts.

Thanks for your continued efforts on this @fuzzybinary ! 👍

For our app we are not using multiple Flutter engines and we do use runZonedGuarded, though it seems thats likely not the source of the issue from your last comment.

I am also using runZonedGuarded, I initialize Sentry, then DataDog. Here's how I initialize it:

Future<void> setupDatadog() async {
  final configuration = DatadogConfiguration(
    clientToken: 'mytoken1234',
    env: appFlavor ?? 'no-flavour',
    site: DatadogSite.us5,
    nativeCrashReportEnabled: true,
    loggingConfiguration: DatadogLoggingConfiguration(),
    rumConfiguration: DatadogRumConfiguration(
      applicationId: 'my-app-id-1234',
    ),
  );

  final originalOnError = FlutterError.onError;
  FlutterError.onError = (details) {
    DatadogSdk.instance.rum?.handleFlutterError(details);
    originalOnError?.call(details); // This allows me to not override other listeners, like Sentry.
  };
  final platformOriginalOnError = PlatformDispatcher.instance.onError;
  PlatformDispatcher.instance.onError = (e, st) {
    DatadogSdk.instance.rum?.addErrorInfo(
      e.toString(),
      RumErrorSource.source,
      stackTrace: st,
    );
    return platformOriginalOnError?.call(e, st) ?? false;
  };

  await DatadogSdk.instance.initialize(configuration, TrackingConsent.granted);
  DatadogSdk.instance.updateConfigurationInfo(LateConfigurationProperty.trackErrors, true);
}

That function is called inside a runZonedGuarded, after await SentryFlutter.init and WidgetsFlutterBinding.ensureInitialized();.

Hi folks - we still cannot reproduce this issue unfortunately. My guess is that this is some sort of race condition on the platform channel during backgrounding, where we are attempting to send view or log events while the app is backgrounding on Android.

However, I will say we do know that even though Sentry / Crashlytics report this as a “Fatal” error, it does not result in the application terminating, and is silent to the user. I verified this by essentially “force disconnecting” the method channel during testing and seeing what the response is from Flutter. This means that users are not seeing a degraded app experience because of this issue.

This doesn’t mean we don’t take the issue seriously, and if anyone can provide us with reproduction steps that would be incredibly helpful.

Maybe contact the flutter team and ask them what might be causing this?

I've gone through some of the less formal channels (Discord, for example), but I may raise a github issue and see if it gets more attention.

All previous changes I made didn't help. We are planning a flutter sdk upgrade. I will post here if it helps.