Caddy Mesh

Caddy service mesh based on the host/node architecture.

Features

Installation

Prerequisites

Build the image for Caddy Mesh controller:

$ make build-image tag=v0.1.0

If using any plugins, you need to build the Caddy image locally:

$ make build-caddy-image tag=2.6.0-beta.3-custom

Install the Helm Chart

$ make helm-install

Configuration

All features provided by Caddy Mesh can be enabled by using annotations on Kubernetes services.

Timeouts

Timeouts can be enabled by using the following annotations:

mesh.caddyserver.com/timeout-dial-timeout: "<duration>"
mesh.caddyserver.com/timeout-read-timeout: "<duration>"
mesh.caddyserver.com/timeout-write-timeout: "<duration>"

Parameters:

timeout-dial-timeout: How long to wait before timing out trying to connect to an upstream. Default: 3s. (See dial_timeout.)
timeout-read-timeout: The maximum time to wait for next read from backend. Default: no timeout. (Requires Caddy v2.6.0-beta.3.)
timeout-write-timeout: The maximum time to wait for next write to backend. Default: no timeout. (Requires Caddy v2.6.0-beta.3.)

Retries

Retries can be enabled by using the following annotations:

mesh.caddyserver.com/retry-count: "<count>"
mesh.caddyserver.com/retry-duration: "<duration>"
mesh.caddyserver.com/retry-on: "<expression>"

Parameters:

retry-count: How many times to retry selecting available backends for each request if the next available host is down. Default: disabled. (Requires Caddy v2.6.0-beta.3.)
- If retry-duration is also configured, then retries may stop early if the duration is reached.
retry-duration: How long to try selecting available backends for each request if the next available host is down. Default: disabled. (See try_duration.)
retry-on: An expression matcher that restricts with which requests retries are allowed. Default: "". (See retry_match.)
- If either retry-count or retry-duration is specified, retry-on will default to "true".

Rate Limiting

Rate limiting can be enabled by using the following annotations:

mesh.caddyserver.com/rate-limit-key: "<key>"
mesh.caddyserver.com/rate-limit-rate: "<rate>"
mesh.caddyserver.com/rate-limit-zone-size: "<zone_size>"

Note that this feature requires the caddy-ext/ratelimit plugin.

Traffic Splitting

Traffic splitting can be enabled by using the following annotations:

mesh.caddyserver.com/traffic-split-expression: "<expression>"
mesh.caddyserver.com/traffic-split-new-service: "<name>"
mesh.caddyserver.com/traffic-split-old-service: "<name>"

Parameters:

traffic-split-expression: An expression matcher that restricts with which requests will be redirected to the new service (or, if unmatched, to the old service). Default: "".
traffic-split-new-service: The name of the new Kubernetes Service. Default: "".
traffic-split-old-service: The name of the old Kubernetes Service. Default: "".

Workflow

(This workflow is inspired by SMI TrafficSplit.)

In this example workflow, the user has previously created the following resources:

Deployment named server-v1, with labels: app: server and version: v1.
Service named server, with a selector of app: server.
Service named server-v1, with selectors: app: server and version: v1.
Clients use the FQDN of server to communicate.
- To leverage Caddy Mesh, clients must use server.test.caddy.mesh (instead of server.test.svc.cluster.local).

In order to update an application, the user will perform the following actions:

Enable Traffic splitting on server (without redirecting traffic to server-v2).

---
kind: Service
apiVersion: v1
metadata:
  name: server
  namespace: test
  labels:
    app: server
+ annotations:
+   mesh.caddyserver.com/traffic-split-expression: "false"
+   mesh.caddyserver.com/traffic-split-new-service: server-v2
+   mesh.caddyserver.com/traffic-split-old-service: server-v1
spec:
  ...

Create a new deployment named server-v2, with labels: app: server and version: v2.
Create a new service named server-v2, with selectors: app: server and version: v2.
Once the deployment is healthy, spot check by sending manual requests to the server-v2.

When ready, the user begins to redirect traffic to server-v2:

For example, the user first route Chrome consumers to server-v2:

---
kind: Service
apiVersion: v1
metadata:
  name: server
  namespace: test
  labels:
    app: server
  annotations:
-   mesh.caddyserver.com/traffic-split-expression: "false"
+   mesh.caddyserver.com/traffic-split-expression: "header({'User-Agent': '*Chrome*'})"
    mesh.caddyserver.com/traffic-split-new-service: server-v2
    mesh.caddyserver.com/traffic-split-old-service: server-v1
spec:
  ...

Verify health metrics and become comfortable with the new version.

The user decides to redirect all traffic to the new version:

---
kind: Service
apiVersion: v1
metadata:
  name: server
  namespace: test
  labels:
    app: server
  annotations:
-   mesh.caddyserver.com/traffic-split-expression: "header({'User-Agent': '*Chrome*'})"
+   mesh.caddyserver.com/traffic-split-expression: "true"
    mesh.caddyserver.com/traffic-split-new-service: server-v2
    mesh.caddyserver.com/traffic-split-old-service: server-v1
spec:
  ...

When completed, cleanup the old resources:

Delete the old server-v1 deployment.
Delete the old server-v1 service.
Remove the Traffic splitting annotations as it is no longer needed.

RussellLuo/caddy-mesh

Caddy Mesh

Features

Installation

Prerequisites

Install the Helm Chart

Configuration

Timeouts

Retries

Rate Limiting

Traffic Splitting

Workflow