cortexlabs/cortex

Host-terminated TLS with Cortex APIs

g-karthik opened this issue · 9 comments

Description

We would like to have host-terminated TLS with our Cortex APIs. At the moment, we are using API Gateway (https://docs.cortex.dev/clusters/networking/https#internal-load-balancer) which is a proxy integration via a private link to the internal network load balancer created by Cortex, but for highly confidential data, it is preferable to use host-terminated TLS.

It is my understanding that it should indeed be possible to terminate TLS on the underlying pod with k8s using a Service type=LoadBalancer. Could you please assist with this feature (if already supported in Cortex, please point to docs)?

@deliahu @vishalbollu

Hi @g-karthik, you can add an ACM SSL certificate to the API load balancer by using the ssl_certificate_arn field in the cluster configuration (docs), is that what you had in mind? You would follow this guide to set up your custom domain, and then this guide to add the cert.

Hi @deliahu while adding a ACM certificates to the API load balancer is a good start, we are looking to encrypt the traffic between the load balancer and the cluster nodes to ensure traffic is decrypted in the API pod. Does Cortex provide support for this?

@deliahu I'm no k8s expert, but talking to some k8s folks offline, it seems the following may need some updates to move TLS-termination from the NLB to the pod:

{% if config.get('ssl_certificate_arn', '') != '' %}

https://istio.io/latest/docs/tasks/traffic-management/ingress/secure-ingress/

and that we'd need to get the TLS cert+key into a k8s Secret so the pods can access it, using cert-manager.

At this point TLS termination at the pod isn't supported because various network hops within the cluster assume that the requests being routed are HTTP. Ingress gateway pods augment requests with headers for internal routing and the proxy sidecar attached to Realtime APIs assumes that the requests are HTTP.

Here is a rough overview of the network hops currently and where HTTPS is decrypted:

client --HTTPS--> NLB (decrypted) --HTTP--> cluster nodes --HTTP--> istio ingress pod --HTTP--> API pod (proxy side car) --HTTP--> API pod (your container)

If we were to move the TLS termination to istio ingress pod which lives inside cluster, would that satisfy your use case? If not, where is the earliest networking hop where TLS termination can occur in your use case?

Would any of the other cluster configuration settings change the your answer to the previous question? For example, you can set the NLB to be internal, force the ec2 nodes to be scheduled in private subnets.

If the API pod (proxy side car) is always running on the same EC2 node as API pod (your container) then that would be the earliest possible point for TLS termination in our use case.

We do use an internal NLB and run ec2 nodes private subnets but still require end-to-end encryption of our data.

Thanks for sharing the context.

It might be possible for Cortex to automatically issue and certificates. Here is one potential design:

  • use cert-manager in the cluster to issue a certificate using let's encrypt
  • configure the cluster to satisfy the HTTP-01 challenge
  • mount certificate into pod and configure Cortex's proxy to decrypt HTTPS using the certificate and forward HTTP requests

This design makes a few assumptions:

  • one domain per cluster, the certificate is shared across multiple APIs
  • let's encrypt will be used as the certificate issuer
  • HTTP-01 challenge will be used

Would this design satisfy your use case or does your use case require more fine-grained control in the certificate issuance and management.

Hi @vishalbollu,

I will have to do a bit more research to see if this approach would work for us. There may be concerns with usage of Let's encrypt and we may want to use our own API proxy pod.

Another alternative could be to terminate SSL in the istio pod but then resume another encrypted connection between isto and the API pod (potentially using mTLS). I will also have to research if we could use this approach.

Thank you for looking into this.

The automatic management of certs, although related, could be a separate feature. It sounds like that may not necessarily help your use case.

It looks like the minimum work that Cortex can do to support your use case is:

  • Route HTTPS traffic to the pod
  • Provide a way to mount kubernetes secrets containing the certificates managed by you to your pod. Containers in the pod, either Cortex's proxy or your own proxy side car can use this certificate to decrypt HTTPS traffic.

Let me know if this is a fair assessment.

@vishalbollu Yeah that sounds like a fair assessment toward ensuring host-terminated TLS with managed certificates via AWS Certificate Manager.