SSLV3_ALERT_BAD_CERTIFICATE: failed calling webhook "connaisseur-svc.connaisseur.svc":

Question

SSLV3_ALERT_BAD_CERTIFICATE: failed calling webhook "connaisseur-svc.connaisseur.svc":

kgopi1 opened this issue a year ago · 8 comments

Describe the bug
failed calling webhook "connaisseur-svc.connaisseur.svc": failed to call webhook: Post "https://connaisseur-svc.connaisseur.svc:443/mutate?timeout=30s": x509: certificate signed by unknown authority (possibly because of "x509: invalid signature: parent certificate cannot sign this kind of certificate" while trying to verify candidate authority certificate "connaisseur-svc.connaisseur.svc")

Expected behaviour
Connaisseur should allow application pods runs successfully.
When few pods got failed with above error , Checked the Connaisseur pods with below log.
connaisseur-tls secret created as Opaque it should be kubernetes.io/tls

##########
Error in HTTPServer.serve
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/cheroot/server.py", line 1807, in serve
self._connections.run(self.expiration_interval)
File "/usr/local/lib/python3.11/site-packages/cheroot/connections.py", line 198, in run
self._run(expiration_interval)
File "/usr/local/lib/python3.11/site-packages/cheroot/connections.py", line 241, in _run
new_conn = self._from_server_socket(self.server.socket)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/cheroot/connections.py", line 295, in _from_server_socket
s, ssl_env = self.server.ssl_adapter.wrap(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/cheroot/ssl/builtin.py", line 270, in wrap
s = self.context.wrap_socket(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/ssl.py", line 1075, in _create
self.do_handshake()
File "/usr/local/lib/python3.11/ssl.py", line 1346, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: SSLV3_ALERT_BAD_CERTIFICATE] sslv3 alert bad certificate (_ssl.c:1002)

####################

Optional: To reproduce

Optional: Versions (please complete the following information as relevant):

OS:
Kubernetes Cluster: AKS 1.25.6
Notary Server:
Container registry:
Connaisseur: 1.6.1
Other:

Optional: Additional context

Answer 1 · 2023-07-28T08:57:59.000Z

Is this a persistent problem, or does it happen once and after reinstalling everything works fine ?

Answer 2 · 2023-07-28T10:52:22.000Z

this happens after whenever restart the AKS cluster.

Answer 3 · 2023-07-31T07:29:41.000Z

I'd suggest you to upgrade connaisseur. Version 1.6.1 is fairly old. That might fix things from the get go.

Answer 4 · 2023-07-31T10:51:08.000Z

I am using 1.6.1 helm chart version. Latest Helm version is 2.0.0 which requires Rekor for image signature.

Answer 5 · 2023-07-31T11:21:55.000Z

Ah ok. First of all, having Connaisseur installed in a cluster, that keeps on restarting is a bit of a problem. This is not an intended use-case, so you will probably run into more problems on the way.

That being said, my guess would be, that the certificates no longer match. Here some commands on where to find them (there are 3, and all commands need to be run inside the Connaisseur namespace):

The certificate inside the Connaisseur container. Can be acquired with: kubectl exec <name-of-a-Connaisseur-pod> -- cat certs/tls.crt (substitute with a Connaisseur pod name)
The certificate stored in the kubernetes secret. This will be pulled by the Connaisseur pods. Command: kubectl get secrets connaisseur-tls -o 'go-template={{index .data "tls.crt"}}' | base64 -d
The certificate used by the webhook. Command: kubectl get mutatingwebhookconfigurations.admissionregistration.k8s.io connaisseur-webhook -o jsonpath="{.webhooks[*].clientConfig.caBundle}" | base64 -d

My guess would be that certificate 1 and 3 do not match. Is that the case?

Answer 6 · 2023-08-01T19:32:38.000Z

Hi @phbelitz , I will try to get values. For my clarification , can you help me understand this behaviour ? According the Azure,
AKS start operations will restore all objects from ETCD with the exception of standalone pods with the same names and ages. meaning that a pod's age will continue to be calculated from its original creation time. This count will keep increasing over time, regardless of whether the cluster is in a stopped state. So in this case , when AKS starts , it restores all the PODS( we deploy Connaisseur via Deployment and not a standalone pod) including its service. if so , why connaisseur service is getting failed alone ?

Answer 7 · 2023-08-03T11:19:18.000Z

In theory you are right, but it also depends on your setup. If you just install Connaisseur and then restart the Cluster, normally your cluster would brick, since Connaisseur will block everything including itself. But since you getting a SLL error, i presumed you have some countermeasurement in place, to prevent that kind of behavior. My guess would be, that your measurements have to do something with it ... 🤷

But here some more explaination on how Connaisseur works (or gets installed) in ragards to your SSL problem, so you can maybe do some investigation on where exactly your problem lies:

Some preface: Connaisseur consists (among other resources) of a webhook configuration and a service with some pods. The webhook configuration essentially connects Connaisseur to the kube API, so that image verification will happen, once specific resources are created. Kubernetes enforces TLS communication on admission controllers, meaning the Connaisseur service+pods must have TLS (usually selfsigned) and the webhook configuration needs to know the services certificate authority (CA), so that a proper TLS communication between the kube API and the Connaisseur service can happen. Your problem most likely is, that the CA configured in the webhook configuration doesn't match the one of the service+pods, as already mentioned in my earlier comment.

That could happen, when the secret containing the TLS key+cert changes, but not the webhook configuration. Why this would happen, I don't know 🤷 Maybe also as extra information: Connaisseur generates it's own TLS key+cert when freshly installed, but reuses already present TLS secrets in cluster, when being reapplied (e.g. through helm upgrade).

Answer 8 · 2024-05-10T15:18:26.000Z

As explained, if this is a problem caused from restarting the cluster, then there is little we can do 🤷 I fthis is a python problem, where the cheroot server can't properly load the certificate, then this is fixed as we no longer use python but golang.

Closing this issue.