Failed to connect to all addresses
ChrisKujawa opened this issue · 1 comments
Describe the bug
It might happen if you use TLS with the Zeebe Gateway that you see errors like: failed to connect to all addresses
.
This is due to expiry of the letsencrypt root certificate https://techcrunch.com/2021/09/21/lets-encrypt-root-expiry/?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAFY4udX0bDWs8_-PdLKtK17SxP9RD51RacH6hg-udUA6s_mZNlxC7Kpq616I761qHZXvzEUJRftePdIrtrJJ-6Mm3PNf4QvcfG0-9RHnmfpqfBe8qIVbDGNmUsbb8WTqkK4aeSIzSxdkDyW1vy9-cKUa_rcIi4LybY1Ggly-FgXF
The grpc core lib, which is used inside the .net grpc lib, which is used by the Zeebe C# client, has a bug grpc/grpc#27532 where it doesn't choose the right certificate to communicate with an secured endpoint.
In order to verify you can set the following env vars.
export GRPC_VERBOSITY=debug
export GRPC_TRACE=tcp,http,api
This should show you something like:
Handshake failed with fatal error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED.
In order to overcome this, either set the grpc root certificate via env variable on the client side.
GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/etc/ssl/certs/ISRG_Root_X1.pem
If you have many clients and use kubernetes you could set the preferred chain on the clusterissuer, see related comment
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: ...
spec:
acme:
privateKeySecretRef:
name: ...
server: ....
email: ...
+ preferredChain: ISRG Root X1
See also https://cert-manager.io/docs/configuration/acme/#use-an-alternative-certificate-chain
I will keep this open until there is a bug fix release for grpc-core.
Should be fixed with #346 🤞