netty/netty

Intermittent SSL handshake timeout after upgrade to Netty4.1.104 Final

Opened this issue · 5 comments

Netty version

4.1.104.Final

JVM version (e.g. java -version)

openjdk version "17.0.11" 2024-04-16 LTS
OpenJDK Runtime Environment 1.0.1830.0 (build 17.0.11+10-LTS)
OpenJDK 64-Bit Server VM 1.0.1830.0 (build 17.0.11+10-LTS, mixed mode)

OS version (e.g. uname -a)

Linux ip-10-1-3-39.ec2.internal 5.10.216-204.855.amzn2.aarch64 #1 SMP Sat May 4 16:53:24 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

We are using Netty as our proxy in front of our RDS server, after upgrade to 4.1.104Final version from 4.1.48, intermittent SSL handshake timeout error happens, our timeout is setting at 10 seconds, overall the percentage is low like 0.002% ish..

Netty is the SSL server, and provider is JDK not OpenSSL

As it's intermittent and hard to reproduce, added the more in the log

sslHandler.engine().isInboundDone() -> false
sslHandler.engine().isOutboundDone() -> true
sslHandler.engine().getHandshakeStatus() -> NEED_UNWRAP

Was suspecting it's certain hosts/clients issue that can't response in time, but after scan the fleet we found it's randomly happens on all callers after the upgrade.

Suspect defect in 4.1.104 final?

Any help is appreciated!

Since you are upgrading from 4.1.48.Final to 4.1.104.Final, one major change between these versions is that TLSv1.3 has been enabled by default since Netty 4.1.52.Final (when the JDK contains TLSv1.3, see #10451). TLSv1.3 support is available in Java 8 since 8u262 and in Java 11 or newer.
The handshake protocol in TLSv1.3 is different than in TLSv1.2 as explained in https://stackoverflow.com/a/62465859 .
This might have an impact in your case. Have you logged the TLS protocol version when the timeout happens?

Hi Lari,

Thanks for the reply. We disabled the tls1.3, for caller sessions experienced the handshake timeout we don’t have the log about final tls version but checking same caller event records, they always use tls1.2.

the selected protocol [TLSv1.2], selected cipher [TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384]

Tls version seems not issue here.

Thanks!

It's also worth checking if "Configuring default extensions" is related.

Some TLS implementations may not handle unknown extensions properly. As a result, you might encounter unexpected interoperability issues when the JDK introduces new extensions.

(check https://bugs.openjdk.org/browse/JDK-8217633 for more details)

Hi Lari,

Thanks for the reply. I have checked the doc, unlikely issue caused by the issue you linked above, we upgraded to Netty104 first and issue shows up, later we upgrade to JDK17 still we see the issue.

Thanks

Hi Lari,

Thanks for the reply. I have checked the doc, unlikely issue caused by the issue you linked above, we upgraded to Netty104 first and issue shows up, later we upgrade to JDK17 still we see the issue.

Thanks

It's unlikely, but possible. I guess your problem is also unlikely to happen. 😉
It's not related to upgrading to JDK17. In JDK17 (and also in many older JDK updates) you have the possibility to configure the behavior.