Refactoring improvements now that we have a proxy-per-workload approach

Question

Refactoring improvements now that we have a proxy-per-workload approach

howardjohn opened this issue 6 months ago · 4 comments

The original ztunnel design was a fully multi-tenant design. In the current architecture, we build up a Proxy per workload, so while we run multi-tenant binary, we have pretty siloed code execution.

Given this, we can make a few improvements IMO. Here is what I am thinking:

Each Proxy gets a WorkloadContext

struct WorkloadContext {
  workload_info: Fetch<Workload>,
  certificate: Fetch<Certificate>,
}

A new Fetch type (note: probably there is a better name, and existing implementation) wraps these, which exposes async getters. All usages calls these, which ensures they get updates if they change, and that they wait if they are not yet available

This is a cleaner model than passing around a CA that can fetch arbitrary certs but hoping we select the right one (to be clear: we have many checks to ensure we do, its just awkward), and similar for workload lookup.

The tricky part here is dedicated mode. In inpod, we have a workloaddescription to match up the workload and WDS. For dedicated we don't really.

Some options:

Make the user provide some info... what info? name, namespace, ...?
Automatically infer the IP based on local interfaces, match that up
Some combination of ^

Answer 1 · 2024-07-08T17:34:39.000Z

The original ztunnel design was a fully multi-tenant design. In the current architecture, we build up a Proxy per workload, so while we run multi-tenant binary, we have pretty siloed code execution.

Given this, we can make a few improvements IMO. Here is what I am thinking:
(snip)
This is a cleaner model than passing around a CA that can fetch arbitrary certs but hoping we select the right one (to be clear: we have many checks to ensure we do, its just awkward), and similar for workload lookup.

Yep, 👍 that in general would be a bit tidier.

The tricky part here is dedicated mode. In inpod, we have a workloaddescription to match up the workload and WDS. For dedicated we don't really.

Some options:
* Make the user provide some info... what info? name, namespace, ...?

* Automatically infer the IP based on local interfaces, match that up

* Some combination of ^

How is this accomplished in sidecar mode? How does a proxy know what workload it should be proxying?

Make the user provide some info... what info? name, namespace, ...?

I'd be inclined to make users specify the same fields in Dedicated mode that we would get from the node agent in Shared mode - that is:

string name = 1;
string namespace = 2;
string service_account = 3;

(we could maybe elide service_account by insisting that ztunnel and the app share the same SA, but I don't think we have to insist on that?)

Answer 2 · 2024-07-08T17:55:30.000Z

How is this accomplished in sidecar mode? How does a proxy know what workload it should be proxying?

We inject a bunch of env vars from downward API to get the pod name, namespace, pod IP, etc.

essentially what i am proposing with the dedicated mode (maybe), but dedicated mode is probably running off-k8s so that means the user will manually do it.

FWIW on Istio sidecar VMs, this is how they do it (manually specify)

Answer 3 · 2024-07-08T17:58:41.000Z

FWIW on Istio sidecar VMs, this is how they do it (manually specify)

That's probably a good enough starting point.

Answer 4 · 2024-07-08T21:15:41.000Z

+1 to manual setting