envoyproxy/envoy

TLS: Server side SNI

vixns opened this issue ยท 27 comments

vixns commented

The current TLS context implementation only support one certificate / key file pair.
Many use cases requires having several certificates files, so it'll be great if we can specify folders in TLS contexts.
We should be able to add / update any file in these folders without restarting.

Can you expand a bit more? How would the different certificates be chosen? SNI? We don't currently support SNI server side either.

vixns commented

Yep, SNI, for TLS terminaison.
The idea is to be able to drop a key/cert file in a folder, for a new or to replace an existing certificate, and have them reloaded without the need of changing command line arguments or restart envoy.

We won't support automatic reloading when certs change, and we probably will not support loading an entire folder of certs (since certs need to be matched to their SNI identifier), but we will support server side SNI eventually. Patches would definitely be welcome on this one.

In addition to SNI for HTTPS, it would be nice to have support for TCP proxy clusters. Potential setups:

  • Communication with cluster is done via HTTP (TLS termination)
  • Communication with cluster is done via HTTPS (reencryption)
  • Communication with cluster is done via TCP (TLS terminated TCP proxy)
  • Communication with cluster is done via TCP with TLS (TCP proxy with reencryption, not pictured)
  • Communication with cluster is done via raw TCP passthrough (TCP passthrough)

Diagram of options

@PiotrSikora - are you actively working on this?

@mattwoodyard nope, I didn't want to touch it before LDS is finalized, since it's most likely going to result in significant changes to the design.

I need this for some internal migration support. Here's my proposal:

  • Allow configuration which can specify an array of ssl_context.
  • ServerContextPtr becomes shared_ptr
  • A std::unordered_map is added to ContextManager which maps (listen_address, hostnames) to contexts.
  • ContextManager gets an added function to return a context given a hostname and listener.
  • When loading the listener the first context initialized is attached to the current ssl_context_ and used as the default for non SNI based connections.
  • The tlsext_server_name callback will use the ContextManager name map to find the 'real context'.
  • the default case of a single cert in a context can be noop'd by not binding the tlsext_server_name callback.

Any problems with this approach?

@mattwoodyard I would much prefer that we have a single logical SslContext, and just handle SNI inside of it via multiple certs, etc. Why do we need multiple contexts? I think just doing it inside the existing context object is much simpler.

That will certainly work. I'll extend the schema to have a list of certificates for a ssl_context and then do all of the work in SslContext instead of ContextManager.

Support for SNI and multiple certificates would be required before we could consider using Envoy as an edge load balancer for North-South traffic in Cloud Foundry. Wildcard certs are not allowed in many regulated industries, and SAN certificates are so painful to manage that operators end up refusing to support custom domains.

Given the current config structure, I would find it most intuitive if certs were configured within the ssl_context for each listener, though for our current needs we don't need more than one TLS listener per instance. Suggestion; replace cert_chain_file and private_key_file with something like pems which takes an array of certificate chains in the PEM format.

This feature is not hard to add. Someone just needs to do it. So whoever wants to firmly sign up just say so.

@mattwoodyard are you working on this actively? If not I'm probably just going to get this done as I'm getting tired of people asking for it. :)

@mattklein123 - I've not got anything ready to commit, and I won't have the time to dedicate to tests and polish for a few weeks.

@mattwoodyard alright let me see where I get with this. Let me know before you spend a bunch of time doing more work.

I would much prefer that we have a single logical SslContext, and just handle SNI inside of it via multiple certs, etc. Why do we need multiple contexts? I think just doing it inside the existing context object is much simpler.

@mattklein123: v2 API uses multiple FilterChain, so we're going to have multiple contexts per listener in v2 anyway.

Also, it's a bit unfortunate that we're discussing implementation details here, instead of focusing on the use cases, since proper SNI support doesn't stop at simply serving different certificates based on the SNI (which is what's been discussed here so far), but it should change filters, routing rules, cluster selection, etc., otherwise it's kind of pointless.

Also, it's a bit unfortunate that we're discussing implementation details here, instead of focusing on the use cases, since proper SNI support doesn't stop at simply serving different certificates based on the SNI (which is what's been discussed here so far), but it should change filters, routing rules, cluster selection, etc., otherwise it's kind of pointless.

I don't really agree with this. 95%+ of the use cases for SNI are already supported if we just do the TLS part. Basically, cert per domain, and then virtual hosts per domain in a single listener. I would recommend starting there. Agree there is a lot more that can be done later.

v2 API uses multiple FilterChain, so we're going to have multiple contexts per listener in v2 anyway.

Maybe, maybe not. This needs more thinking/design. For example, will we route to logical listeners based on SNI? If so we still have a single SSL context before filter stack routing happens, right?

Although an SNI hostname could be used to make a routing decision based on a domain name, this is not why we need SNI. The primary use case is reducing the burden on the operator for certificate management and secondarily, improved security from not exposing unexpected domain names in a cert.

As our users can construct routing rules based on more complex HTTP attributes than hostname (e.g. context path), we have to decrypt the request to get at the HTTP payload anyway.

@mattklein123 sure, but single SSL context means that client certificate verification settings would apply to everything on a particular listener, making it all or nothing for a particular listening port.

@PiotrSikora ok that make sense. I guess we could use the original design that @mattwoodyard proposed. I guess really all I mean is that IMO without very much work we can unblock a lot of people / make them happy, so it would be best to do that before looking at all of the more complex cases. If we want to work on multiple ssl_context at the config level and we think that is better, that's fine with me. I was just trying to make it as simple as possible.

(I do think that the "ssl_context" at the config level could be a container for multiple things inside of it, but it's probably not worth doing that vs. just doing an array and if a single object is supplied just make it an array for back-compat).

OK I just synced offline with @PiotrSikora. I agree with him that my suggestion above of not doing multiple ssl_context is not a good idea and roughly the original design that @mattwoodyard proposed is a better idea. The goal here will be to optionally have a map of SNI names to ssl_context configurations in the v1 configuration (along with the current default context). All of this code should then be usable in the new v2 API.

@PiotrSikora do you want to take on ^ ?

Any news as to how implementation is progressing?

SNI support is scheduled for Istio 0.3 (end of November) but is blocked by this. Will support in Envoy land before then?

@PiotrSikora can you potentially update us on progress/timelines? Thank you.

@rochdev yeah, I'm actively working on it... next week at the latest.

@PiotrSikora I'm going to leave this ticket resolved, as we have other tickets on fully supporting multiple filter chains.