launchdarkly/java-server-sdk

Excessive logging for invalid SDK key

seangatewood opened this issue ยท 7 comments

Is this a support request?
No

Describe the bug
When an LDClient has a bad SDK key (because it expired or was bad from the beginning), it repeatedly logs about it and triggers any DataSourceStatusProvider.StatusListeners

To reproduce

  • Initialize a LDClient connected to environment X
  • Wait for log to say INFO com.launchdarkly.sdk.server.LDClient.DataSource - Initialized LaunchDarkly client.
    • (I set a breakpoint after initializing it and configured to stop only the test thread)
  • Go into LaunchDarkly and reset environment X's SDK key (keep old key around for 0 hours)
    image
  • The log will repeatedly say ERROR com.launchdarkly.sdk.server.LDClient.DataSource - Error in stream connection (giving up permanently): HTTP error 401 (invalid SDK key)
    image
  • Upon further inspection, it seems this bug also occurs if you just set the SDK key to any random string from the beginning like pizza

The same issue exists for StatusListeners
We were also considering setting up alerts for sites that have expired SDK keys, and we were considering using the DataSourceStatusProvider for that.

However, if I add this code into my test:

DataSourceStatusProvider dataSourceStatusProvider = client.getDataSourceStatusProvider();
dataSourceStatusProvider.addStatusListener(status -> {
  DataSourceStatusProvider.State state = status.getState();
  if (state == DataSourceStatusProvider.State.OFF) {
    System.out.println("\033[1;33m" + "My listener was called!" + "\033[0m"); // yellow
  }
});

then you will see that the listener is also repeatedly called:
image

Upon further inspection, if I make my listener always print what the State was:

status -> {
  DataSourceStatusProvider.State state = status.getState();
  System.out.println("\033[1;33m" + "My listener was called! " + state.name() + "\033[0m"); // yellow
}

then it seems to oscillate between INTERRUPTED and OFF:
image

Expected behavior
If it really "gives up permanently" (which it should do for an invalid SDK key), then it should only log that message once. ๐Ÿ™‚ That also seems to be what this code is trying to do?

Also we kinda expected addStatusListener to only trigger for status changes, which should only be once for a non-recoverable error like having an invalid SDK key.

Logs

[Test worker] INFO com.launchdarkly.sdk.server.LDClient.DataSource - Enabling streaming API
[Test worker] INFO com.launchdarkly.sdk.server.LDClient.DataSource - Starting EventSource client using URI: <hidden>
[Test worker] INFO com.launchdarkly.sdk.server.LDClient - Waiting up to 5000 milliseconds for LaunchDarkly client to start...
[okhttp-eventsource-stream-[]-0] INFO com.launchdarkly.sdk.server.LDClient.DataSource - Connected to EventSource stream.
[okhttp-eventsource-events-[]-0] INFO com.launchdarkly.sdk.server.LDClient.DataSource - Initialized LaunchDarkly client.
[okhttp-eventsource-stream-[]-0] WARN com.launchdarkly.sdk.server.LDClient.DataSource - Connection unexpectedly closed
[okhttp-eventsource-stream-[]-0] WARN com.launchdarkly.sdk.server.LDClient.DataSource - Error in stream connection (will retry): java.io.EOFException
[okhttp-eventsource-stream-[]-0] INFO com.launchdarkly.sdk.server.LDClient.DataSource - Waiting 1477 milliseconds before reconnecting...
[okhttp-eventsource-stream-[]-0] ERROR com.launchdarkly.sdk.server.LDClient.DataSource - Error in stream connection (giving up permanently): HTTP error 401 (invalid SDK key)
[okhttp-eventsource-stream-[]-0] WARN com.launchdarkly.sdk.server.LDClient.DataSource - Connection unexpectedly closed
[okhttp-eventsource-stream-[]-0] WARN com.launchdarkly.sdk.server.LDClient.DataSource - Error in stream connection (will retry): java.io.EOFException
[okhttp-eventsource-stream-[]-0] INFO com.launchdarkly.sdk.server.LDClient.DataSource - Waiting 3682 milliseconds before reconnecting...
[okhttp-eventsource-stream-[]-0] ERROR com.launchdarkly.sdk.server.LDClient.DataSource - Error in stream connection (giving up permanently): HTTP error 401 (invalid SDK key)
[okhttp-eventsource-stream-[]-0] WARN com.launchdarkly.sdk.server.LDClient.DataSource - Connection unexpectedly closed
[okhttp-eventsource-stream-[]-0] WARN com.launchdarkly.sdk.server.LDClient.DataSource - Error in stream connection (will retry): java.io.EOFException
[okhttp-eventsource-stream-[]-0] INFO com.launchdarkly.sdk.server.LDClient.DataSource - Waiting 4075 milliseconds before reconnecting...
[okhttp-eventsource-stream-[]-0] ERROR com.launchdarkly.sdk.server.LDClient.DataSource - Error in stream connection (giving up permanently): HTTP error 401 (invalid SDK key)
[okhttp-eventsource-stream-[]-0] WARN com.launchdarkly.sdk.server.LDClient.DataSource - Connection unexpectedly closed
[okhttp-eventsource-stream-[]-0] WARN com.launchdarkly.sdk.server.LDClient.DataSource - Error in stream connection (will retry): java.io.EOFException
[okhttp-eventsource-stream-[]-0] INFO com.launchdarkly.sdk.server.LDClient.DataSource - Waiting 14868 milliseconds before reconnecting...
[okhttp-eventsource-stream-[]-0] ERROR com.launchdarkly.sdk.server.LDClient.DataSource - Error in stream connection (giving up permanently): HTTP error 401 (invalid SDK key)
...

SDK version
5.3.0, but also confirmed the error exists on 5.5.0

Language version, developer tools
Java 8, IntelliJ 2021.1.1 Community Edition

OS/platform
MacBook Pro 2019
macOs Catalina
Version 10.15.7

Additional context
This causes us concern since we have a lot of long-running sites that may begin to operate on a stale SDK key if we rotate the SDK key. While this is mostly fine, since the sites will just stop getting updates from LaunchDarkly, the excessive logging could be a problem for our disk space utilization and logging quality.

Whoops, yes, this is definitely a bug. As you guessed, the code is supposed to treat a 401 as a permanent failure (since an invalid SDK key can never become valid again) and something in that logic is clearly broken now. Which I guess means the test coverage for it is also wrong somehow, since I'm fairly sure we have unit tests specifically for this. Investigating.

Indeed, we had a test, but it wasn't a valid test. The way we were setting up mock server responses wasn't adequately realisticโ€” we're currently working on implementing better end-to-end tests to catch just this sort of thing. Anyway, sorry about this and it should be a straightforward fix; we're working on it now.

Version 5.5.1 has been released and should fix this.

And thanks for bringing it to our attention.

I have confirmed on my end that 5.5.1 fixes the test I was running ๐ŸŽ‰ Thank you! ๐Ÿ˜ƒ
image

@eli-darkly We are using launchdarkly-java-server-sdk 5.6.3 and are still seeing multiple attempts being made to post diagnostics, even though the requests fail with a 401 because of an invalid SDK key.

afbeelding

Should I open a new bug report or am I missing something?

@frederikvanhoyweghen That looks like it is a different bugโ€” that is, it's the same kind of issue, but it's in a different component that was not what the original bug report was about. If you wouldn't mind filing a separate issue, that'd be helpful.