launchdarkly/js-client-sdk

Infinite flag changes through SDK and socket

Closed this issue · 5 comments

Is this a support request?

No, it's an observation we had while the the problem now disapeared.

Describe the bug

We noticed twice now that LaunchDarkly keeps flushing flag updates to our app, through the SDK and flopflip (our library we use). This leads to an infinte change in the UI. E.g. buttons appearing and disapearing.

To reproduce

The problem occured twice in two weeks and resolved itself after 20-30 minutes without making a change to our code base.

Ideas

We are wondering if the SDKs could use an exponential back-off. So given that the backing LaunchDarkly APIs keep streaming flag changes too frequently over a certain amount of time. Then the SDK whould only notify the consumer through the change handlers on a lowering frequency.

I'm not sure I fully understand the description of what's happening.

  1. By "keeps flushing flag updates", do you mean actual flag updates that were due to changes on the LD dashboard— but were delivered multiple times or in the wrong order— or updates that should not have happened at all? The reason I ask is that if LD sends an update that does not actually change the value of the flag, that should not cause the SDK to send an update event to your code. And even if it did send an event, if the value was the same as before, I'm not clear on why that could cause "buttons appearing and disappearing."

  2. Do you mean it is "flushing flag updates" over an existing streaming connection, or it is losing the connection and starting a new one repeatedly? It should be possible to see this in the browser console. (Although, again, I'm not clear on why restarting the streaming connection would cause the behavior you're describing, since it would be obtaining the same flag values as before.)

It might be easier for me to understand this if you said more about how your UI logic works in response to change events. Normally I would imagine that "buttons appearing and disappearing" would only happen if you got a change event that, for instance, changed the "show-button-x" flag from true to false or vice versa.

About your backoff idea - that is how the other SDKs work, and the main reason the JS SDK doesn't use a backoff for reconnections is that it normally relies on the browser's built-in EventSource implementation, which provides no way to do that. So if we were to add that behavior, it would have to be in a polyfill implementation. People already do have to use polyfills sometimes if they want streaming to work in (for instance) Internet Explorer, so that might not be too bad, although right now they can use any EventSource polyfill and this would require them to use a specific one that we provide.

Sorry for the not understandble issue description caused by me writing it up in-between things right after we had the incident.

  1. Infinte flag flushing: a change on the LD dashboard is only made once. After the flag updates continue being "pushed" to the SDK. This results in the application to always assume flags having changed. This results in toggled elements in the UI briefly disapearing and then reapearing or the other way round.
  2. New or old connection: I am not 100% sure on that. I can upload a gif I made when it happened.

You can find a large recording here. Is there anything else I can provide or help with?

Unfortunately it's a bit hard for me to understand what I'm looking at in that animated GIF.

I could see that when you turned targeting off for the flag, the browser started making a series of HTTP requests - first a regular request, and then an EventSource request, each time - and I could see some kind of repeated updates happening on the page, as you said.

However, it also looked like when you then turned targeting on for the flag, all of that stopped and the page became stable.

If I'm correct about the above, then I strongly suspect that this is not a case of the SDK pushing repeated flag updates. Here's why I think so:

  • The JS SDK has no ability to detect specifically whether the flag has targeting turned on or off. All it sees is the resulting flag variation for the current user— and the SDK's behavior does not depend in any way on what that variation is for any given flag; all it does is pass that value on to your application code. Therefore, I can't think of a mechanism where it would only get into an infinite loop of updates when the flag is turned off.
  • As I mentioned earlier, when there is a flag update, the SDK gives your application code the new flag value—either because you requested it with variation, or because you subscribed for change events. If you are not actually changing the value back and forth, then the new value will be stable: that is, even if you were getting redundant change events for some reason, the new value would be "true", "true", "true", "true" or "false", "false", "false", "false", not "true", "false", "true", "false". So I can't think how that would produce the behavior of your UI flipping back and forth. Again, it is hard for me to say more without knowing anything about how your UI logic works in response to change events.
  • In your screen recording, I can't see the exact URLs of these requests, nor the response content (or event log for the EventSource requests), so I can't tell what values LD is actually delivering to your app. But I can see that it is making new connections— which is not normal if the only thing that changed was a flag value. The whole point of the streaming EventSource is that the updates get pushed along the same connection.
  • Even if we lose the EventSource connection, it should only be re-establishing that connection, not making an additional (non-streaming) GET request. The latter should only happen if you changed the current user properties by calling identify, which causes it to re-request all the flag values.

In summary, from what I'm seeing, this does not look like the SDK receiving or delivering redundant updates. It looks to me like what you would get if you were calling identify many times with different user properties. Is there any possibility that your code could do such a thing? And if so, would that behavior depend on whether this particular flag is on or off?

Thanks for getting back. We've not seens the issue appear again for 14 days now. I also checked our code, we don't seem to call indentify repeatetly anywhere.

I suggest that I try to get better insight whenever the issue occurs again and re-open this issue then while debugigng the situation in the direction you suggested. Thanks again for your time.