launchdarkly/java-server-sdk

Mutable HttpConfiguration Headers ?

steebe opened this issue · 4 comments

Hey! I'm @steebe from Mercato

Is your feature request related to a problem? Please describe.

Background

We're operating within a monolith, that shares quite a few resources between our application server code and a collection of scripts. The scripts in question are simple JVM processes that are cronned to do various tasks. When the following occurs, it keeps the respective script from exiting successfully. On a few occasions, we have been experiencing script hangs due to underlying HTTP Connection behavior in the LaunchDarkly Java SDK.

WARN 18:54:25:173 okhttp-eventsource-stream-[]-0 eventsource.SLF4JLogger:34 - Connection unexpectedly closed
WARN 18:54:25:186 okhttp-eventsource-stream-[]-0 server.Util:142 - Error in stream connection (will retry): java.io.EOFException
INFO 18:54:25:197 okhttp-eventsource-stream-[]-0 eventsource.SLF4JLogger:29 - Waiting 1910 milliseconds before reconnecting...
WARN 18:54:30:027 okhttp-eventsource-stream-[]-0 server.Util:142 - Error in stream connection (will retry): HTTP error 503
WARN 18:54:30:028 okhttp-eventsource-events-[]-0 server.StreamProcessor$StreamEventHandler:354 - Encountered EventSource error: com.launchdarkly.shaded.com.launchdarkly.eventsource.UnsuccessfulResponseException: Unsuccessful response code received from stream: 503
WARN 18:54:30:028 okhttp-eventsource-stream-[]-0 eventsource.SLF4JLogger:34 - Connection unexpectedly closed
WARN 18:54:30:028 okhttp-eventsource-stream-[]-0 server.Util:142 - Error in stream connection (will retry): java.io.EOFException
INFO 18:54:30:028 okhttp-eventsource-stream-[]-0 eventsource.SLF4JLogger:29 - Waiting 2287 milliseconds before reconnecting...
WARN 18:54:34:060 okhttp-eventsource-stream-[]-0 server.Util:142 - Error in stream connection (will retry): HTTP error 503
WARN 18:54:34:060 okhttp-eventsource-events-[]-0 server.StreamProcessor$StreamEventHandler:354 - Encountered EventSource error: com.launchdarkly.shaded.com.launchdarkly.eventsource.UnsuccessfulResponseException: Unsuccessful response code received from stream: 503
WARN 18:54:34:060 okhttp-eventsource-stream-[]-0 eventsource.SLF4JLogger:34 - Connection unexpectedly closed
WARN 18:54:34:061 okhttp-eventsource-stream-[]-0 server.Util:142 - Error in stream connection (will retry): java.io.EOFException
INFO 18:54:34:061 okhttp-eventsource-stream-[]-0 eventsource.SLF4JLogger:29 - Waiting 4367 milliseconds before reconnecting...
INFO 18:54:38:504 okhttp-eventsource-stream-[]-0 eventsource.SLF4JLogger:29 - Connected to EventSource stream.
WARN 18:15:52:378 okhttp-eventsource-stream-[]-0 eventsource.SLF4JLogger:34 - Connection unexpectedly closed
WARN 18:15:52:380 okhttp-eventsource-stream-[]-0 server.Util:142 - Error in stream connection (will retry): java.io.EOFException

There is an old issue ( #93 ) that directly relates to this. It seems to have bounced around for a while here before being closed due to it not being at all trivial to address. Regardless, I think it's important to note that it's not strictly limited to Tomcat behavior.

After looking into things, I think what's happening is that LD treats each of its HTTP calls (performed via okhttp3 libraries) as streams with no explicit close behavior enforced at the connection level, other than a timeout. It looks like the stream is interrupted at a specific point, or experiences a certain state while it's being interrupted, it hangs with constant attempts to reconnect to the server it established the stream with.

Obviously, I'm still a bit naïve to the specific chain of events that is causing our specific issue, but a combination of this underlying streaming behavior and the way that our JVM processes' runtime exit is resulting in the above intermittently.

Notes

  1. This change to your underlying HTTP Event Source wrapper would in theory solve our problem, but it would also likely cause weird/bad behavior within the downstream LD SDKs (and your servers, for that matter)
  2. Perhaps, a mechanism to specify additional headers (or overrides) within our SDK's connection behavior, and do so via your HttpConfigurationBuilder would solve the problem, but the server SDK prevents this with Immutability. I suspect this was a design choice to prevent the potential problems mentioned above in (1)

Describe the solution you'd like
I suppose I'm looking for either confirmation of my suspicions in (1) + (2) above, and/or acknowledgement that there are no intentions on changing the Java SDK's capability to provide additional headers beyond the currently immutable defaults would suffice 😄
In that case, we will just expect to keep LD out of our scripts area for the foreseeable future.

Describe alternatives you've considered
As mentioned above:

  • Omitting LD from our scripts entirely (our present solution/work-around)

Additional context
Apologies if I've provided incorrect context about the okhttp3 behavior, or if I've overlooked any mechanisms provided for mitigating this behavior thru the SDKs themselves (outside of cloning & publishing entirely new versions of them).

Sorry you've been running into this difficulty. I'll do my best to respond from the point of view of an SDK maintainer— although it's possible that it will be hard to diagnose what's going on without more access to the details of your environment and knowledge of what's in the scripts you are running, in which case it might be best to take it to a support ticket through support.launchdarkly.com instead of this public forum.

There are a few things I'm unclear on here:

  1. You said that when this happens, it "keeps the respective script from exiting successfully". How? In other words, where is it hanging: in the call to the LDClient constructor, or at some later point?
  2. I don't see how the stream reconnect behavior you're describing would cause a hang. It is normal and intended for the SDK's stream connection to attempt a reconnect if it gets an HTTP 503 error, as is happening here, since such an error normally indicates a temporary service interruption. The only time that the SDK would permanently give up on a stream connection is if it got a 401 or 403 error, since that would indicate that the SDK key you're trying to use is invalid and can never work. But in any case, stream retries happen on a worker thread and should not prevent your code from proceeding with whatever it wants to do. It should stop retrying if you shut down the SDK with client.close().
  3. I also am not sure how Connection: close would be relevant to this. If your guess is that the application is being kept from exiting because an HTTP connection is still open— do you have specific evidence that that's the case? The Tomcat issue you linked to was not about that; it was about OkHttp maintaining a daemon thread even after the HTTP client was completely shut down and after all connections were closed, which I don't think is an issue in current versions of OkHttp.

To more directly answer the question that's in the issue title: it'd be reasonable and pretty straightforward for us to add a "put this specific header into all of the SDK's HTTP requests" method to HttpConfigurationBuilder. That's an ability that does exist in some of the other SDKs, and it could be useful for certain back-end configurations where there's some kind of gateway in between the SDK and LD that expects a special header. However, I'm still not sure that is relevant to the specific issue of these hangs that you're seeing; it might be a red herring. And in any case if there is a general problem with HTTP request semantics that would be addressed by adding Connection: close or any other header, then presumably that would be a potential problem in any application, not just in yours, and should be fixed in the SDK code rather than by requiring the app to set some special configuration.

@eli-darkly -- thank you for the detailed responses, and sorry again for probably not looking at the problem from the right angles. We did turn out to have an issue in which our central executor, responsible for invoking our scripts, did possess a race case that prevented client.close() from being called after executing certain routines that close some of our proprietary connections.

This ultimately makes my initial blurb a moot point; I'll close this issue. Thank you for all of the information in your first post's second point -- it's helpful to get that insight into SDK behavior in certain response scenarios.

Apologies again for conflating the issue board here, and thanks for the replies.

No worries—glad to hear it worked out!