unexpected eof while reading:ssl/record/rec_layer_s3.c:320 when enable semeruCloudCompiler with Liberty Java21
Opened this issue · 21 comments
Test InstantOn with Open Liberty 24.0.0.2 Java21 on Power9 OCP 4.14 cluster. I deployed the checkpoint app image to the cluster using Open Liberty Operator 1.3.1 with JIT enable (.spec.semeruCloudCompiler.enable: true)
apiVersion: apps.openliberty.io/v1
kind: OpenLibertyApplication
metadata:
name: myapp
spec:
applicationImage: <my-checkpoint-app-image>
semeruCloudCompiler:
enable: true
replicas: 1
...............
- View the semeru compiler pod log and see it has the following error:
#INFO: StartTime: Feb 16 14:37:39 2024
#INFO: TimeZone: UTC (UTC)
JITServer is ready to accept incoming requests
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
#JITServer: t= 21652 A new client (clientUID=18080393426921679733) connected. Server allocated a new client session.
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
I also see the same errors on Azure Kubernetes Service (AKS) cluster
I'll have a look. A question though: is this seen only on Power (i.e. is x86-64 running fine)?
Thanks. I have a few more questions to limit my search space:
- Is it seen only with Java21, or does Java 17 have the same problem?
- If, Java 17 also shows the problem, what was the first OpenJ9 release that started to show the problem?
Another question: did we ever had a combination of InstantOn + Semeru Cloud Compiler that worked ok in OCP when deployed with the Liberty Operator?
I have tested Liberty 24.0.0.1 + Java 21 release 0.42.0 with InstantON and Semeru Cloud Compiler in containers, but not in OCP and it's working ok. This is why I am trying to understand whether this test using the Liberty Operator has ever passed.
this is the first time I test InstantOn + Semeru Cloud Compiler on OCP4.14
this is the first time I test InstantOn + Semeru Cloud Compiler on OCP4.14
Then, it's possible that there is something wrong at the Operator level on how the client and server are setup. The symptoms indicate that the certificate at the client does not match the one at the server.
I also see the error on x86 OCP 4.14 cluster with Java 21 but not with Java 17
@tam512 In another issue(performance related) you deployed Liberty+Semeru Cloud Compiler with Java 21 in OCP using yaml files. Did that experiment use encryption? If so, I would rather say it's an Operator issue rather than a JVM issue. Thanks
InstantOn app with Liberty Operator and Semeru Cloud Compiler was validated on OCP as part of this issue: OpenLiberty/open-liberty-operator#438 (comment)
I am not sure which Java version was used, but Liberty images with Java 21 weren't available then.
Liberty Operator generates the certificate and sets up the server and client the same way whether the app is InstantOn or not. The only difference is that the InstantOn app client gets the server details via the OPENJ9_RESTORE_JAVA_OPTIONS
env.
I'll suggest trying this scenario on OCP without Liberty Operator.
I also wonder whether InstantOn could possibly be overridding the certificate, so instead of using the certificate mounted at runtime (by the Operator), a certificate from the snapshot is used.
I tried the same test, deploy app without InstantOn using Open Liberty operator+semeruCloudCompiler, I still see the same errors so it's not instantOn related
Once, I'll get my hands on a K8 cluster I'll try to reproduce. Just using containers I don't see the issue.
I created a Kubernetes cluster based on Microk8s and a container with Liberty 24.0.0.1 and AcmeAir app.
If I deploy the container with OLO using Java 21 I see the problem mentioned in this issue.
If I deploy the container with OLO using Java 17, everything is fine.
If I deploy the container with yaml, using either Java 17 or Java 21 everything is fine.
Thinking that the certificates created by OLO+Java21 may be wrong, I deployed my client+server solution with yaml files, but pointing at the certificates from the TLS secret created by OLO. This was also successful which means that the certificates are good.
In another experiment, I connected my AcmeAir container deployed with yaml to the server deployed by OLO. This also fails, so this may point at a difference between the server launched by OLO and the server launched with yaml, but so far I haven't found any.
I added verbose options to the client and I see that it successfully connects to the server and the server successfully compiles methods even though those "unexpected eof" messages appear at the server.
I googled about those messages and found this text that may or may be not relevant:
OpenSSL 3 (a major release) changed some behaviour compared to 1.1.1 with respect to peers that
fail to shutdown a TLS connection cleanly.
Previously, if a peer unexpectedly shutdown a connection an OpenSSL IO function (such as SSL_read())
would report an error and SSL_get_error() would report SSL_ERROR_SYSCALL and errno would be 0.
This was considered a bug in 1.1.1 (you should never get SSL_ERROR_SYSCALL but with errno as 0).
However fixing it in 1.1.1 broke some apps. We delayed the fix until the next major version (OpenSSL 3.0).
In OpenSSL 3.0 this error is now reported from SSL_get_error() as SSL_ERROR_SSL and the
unexpeced eof while reading error is put on the OpenSSL error stack.
We also added a new option SSL_OP_IGNORE_UNEXPECTED_EOF which treats an unexpected EOF
from the peer as if they had performed an orderly shutdown.
In order to further debug this issue, it would be very helpful to enable logging at the server by defining an environment variable:
_JAVA_OPTIONS="-Xjit:verbose={JITServer},verbose={compilePerformance}
@leochr Is there a way to inject JVM options to the Semeru Cloud Compiler pod?
I tried
semeruCloudCompiler:
enable: true
replicas: 1
env:
- name: _JAVA_OPTIONS
value: "-Xjit:verbose"
but spec.semeruCloudCompiler.env
is not defined.
When I create my containers with Java 21 I see
#JITServer: Successfully initialized SSL context (OpenSSL 3.0.7 1 Nov 2022)
but when I create my containers with Java 17 I see
#JITServer: Successfully initialized SSL context (OpenSSL 1.1.1k FIPS 25 Mar 2021)
Based on the text that I pasted in the previous comment, it's very likely that the change from OpenSSL 1.1.1 to OpenSSL 3.0.7 made those "unexpected eof" messages appear. I still cannot explain why those messages do not appear when I start JITServer with yaml instead of OLO.
@mpirvu It's not possible to set env on Semeru Cloud Compiler via the OpenLibertyApplication CR. But you could set it directly on the Semeru Cloud Compiler Deployment (i.e. <CR_name>-semeru-compiler-1) that's created by OLO. Make sure to scale down the OLO Deployment olo-controller-manager
to 0, otherwise OLO will revert the manual changes to the compiler's Deployment.
Once debug is complete, scale OLO Deployment back to 1 to resume Operator reconciliation.
Please attach the yamls that were using. I'll take a look to see if I can spot any difference that could cause the behaviour change. Thank you
I think I understand what's causing those 'unexpected eof' messages : the server defined by OLO adds liveness and readiness probes. I believe the probes establish a connection on the 38400 port, but not a full ssl connection. I managed to replicate the errors on my JITServer deployed with yaml files by adding liveness/readiness probes like OLO does.
I was puzzled by the fact that the server shows some connection errors, while the client does not show any trace of a connection problem. The probing mechanism explains the mystery.
Searching the net I did not find a way to send a probe on a TCP port using encryption.
There are two possible solutions here:
- Filter out the error message (basically going for an OpenSSL 1.1.1 solution).
- Add another port that will be dedicated to receiving unencrypted probes.
The second solution is used in Hazelcast. Citing from their documentation:
since Kubernetes liveness/readiness probes cannot use SSL, you need to prepare a Hazelcast configuration with a separate non-secured port opened for health checks.