/egressgateway

Primary LanguageGoApache License 2.0Apache-2.0

egressgateway

Auto Nightly CI Auto Release Version codecov Go Report Card CodeFactor Quality Gate Status

Abstract

The gateway provides network egress capabilities for Kubernetes clusters.

  • Solve IPv4 IPv6 dual-stack connectivity.
  • Solve the high availability of Egress Nodes.
  • Allow filtering Pods Egress Policy (Destination CIDR).
  • Allow filtering of egress Applications (Pods).
  • Can be used in low kernel version.

Background

Starting with 2021, we received some feedback as follows.

There are two clusters A and B. Cluster A is VMware-based and runs mainly Database workloads, and Cluster B is a Kubernetes cluster. Some applications in Cluster B need to access the database in Cluster A, and the network administrator wants the cluster Pods to be managed through an egress gateway.

Proposal

CRDS

The egress gateway model abstracts three Custom Resource Definitions (CRDs): EgressNode , EgressNode and EgressGatewayPolicy. They are cluster scoped CRDs.

EgressGateway

apiVersion: egressgateway.spidernet.io/v1
kind: EgressGateway
metadata:
  name: "egressgateway"
spec:
  nodeSelector:
    matchLabels:
      egress: "true"
status:
  forwardMethod: "active-passive"
  nodeList: 
    - node1:
        status: "ready"
        active: true
        interfaces:
        - eth0:
            ipv4: ["10.6.0.10/16"]
            ipv6: ["fd::10/64"]
  • spec
  • status
    • forwardMethod field sync form ConfigMap configuration.
    • nodeList field is the list of nodes matched by nodeSelector
      • status field represents the node status, which may be Ready, NotReady or Unknown.
        • Only nodes in the Ready state can participate in the election of egress gateway nodes.
      • avtive field represents that the non-egress gateway is reconcile or reconcile completes accessing the destination CIDR(e.g. Cluster A CIDR in picture 1) with this node.
      • interfaces is physical network interface list. It is updated by the Agent.
        • ipv4 address list.
        • ipv6 address list.

EgressNode

apiVersion: egressgateway.spidernet.io/v1
kind: EgressNode
metadata:
  name: "node1"
spec:
status:
  phase: "Succeeded"
  vxlanIPv4IP: "172.31.0.10/16"
  vxlanIPv6IP: "fe80::/64"
  tunnelMac: "xx:xx:xx:xx:xx"
  physicalInterface: "eth1"
  physicalInterfaceIPv4: ""
  physicalInterfaceIPv6: ""

The EgressNode CRD stores vxlan tunnel information, which is generated by the Controller from the Node CR.

  • status
    • phase indicates the status of EgressNode. If 'Succeeded' has been assigned and the tunnel has been built, 'Pending' is waiting for IP assignment, 'Init' succeeds in assigning the tunnel IP address, and 'Failed' fails to assign the tunnel IP address.
    • vxlanIPv4IP field represents the IPv4 address of VXLAN tunnel.
    • vxlanIPv6IP field represents the IPv6 address of VXLAN tunnel.
    • tunnelMac field represents the MAC address of IPv4 VXLAN tunnel Interface.
    • physicalInterface is parent name of VXLAN tunnel interface.
    • physicalInterfaceIPv4 is parent IPv4 Address of VXLAN tunnel interface.
    • physicalInterfaceIPv6 is parent IPv6 Address of VXLAN tunnel interface.

EgressGatewayPolicy

apiVersion: egressgateway.spidernet.io/v1
kind: EgressGatewayPolicy
metadata:
  name: "policy"
spec:
  appliedTo:
    podSelector:
      matchLabels:
        app: "shopping"
      ipv6PodSubnet: "10.0.0.0/16"
      ipv4PodSubnet: "10.0.0.0/16"
  destCIDR: 
   - "10.6.1.0/24"
  • spec
    • podSelector filed selects the grouping of pods to which the policy applies.
    • podSubnet field specifies the pod CIDR affected by the egress policy. It conflicts with the podSelector field.
    • destCIDR destination CIDR block list.

Datapath

A combination of vxlan tunnel, ipset, iptables, route is required to complete policy control.

Non Egress Node

VXLAN

Build a VXLAN tunnel on cluster nodes. There are 2 tunnel NICs named egress-vxlan-v4 and egress-vxlan-v6.

IPSet

sudo ipset create egress-dst-policy-name
sudo ipset add egress-dest-policy-name 172.16.1.1/32

IPTables

iptables -t mangle -F EGRESSGATEWAY-MARK-REQUEST-POLICY-NAME
iptables -t mangle -X EGRESSGATEWAY-MARK-REQUEST-POLICY-NAME
iptables -t mangle -N EGRESSGATEWAY-MARK-REQUEST-POLICY-NAME

iptables -A EGRESSGATEWAY-MARK-REQUEST-POLICY-NAME \
  -t mangle \
  -m conntrack --ctdir ORIGINAL \
  -m set --match-set egress-dst-policy-name dst \
  -m set --match-set egress-src-policy-name src \
  -j MARK --set-mark 0x11000000 \
  -m comment --comment "rule uuid: mark request packet"

Route

Normal.

ip rule add fwmark 0x11000000 table 100
ip route f table 100
ip route add default via 20.0.0.85 dev egress-vxlan-v4 onlink table 100

Equal-cost multi-path routing.

sysctl -w net.ipv4.fib_multipath_hash_policy=1
ip rule add fwmark 0x11000000 table 100
ip route f table 100
ip route add table 100 default \
  nexthop via 20.0.0.85 dev egress-vxlan onlink \
  nexthop via 20.0.0.90 dev egress-vxlan onlink

Egress Node

iptables -t mangle -I FORWARD 1 -m mark --mark 0x11000000 -j MARK --set-mark 0x12000000 -m comment --comment "egress gateway: change mark"
iptables -t filter -I FORWARD 1 -m mark --mark 0x12000000 -j ACCEPT -m comment --comment "egress gateway: keep mark"
iptables -t filter -I OUTPUT 1 -m mark --mark 0x12000000 -j ACCEPT -m comment --comment "egress gateway: keep mark"
iptables -t mangle -I POSTROUTING 1 -m mark --mark 0x12000000 -j ACCEPT -m comment --comment "egress gateway: keep mark"
iptables -t nat -I POSTROUTING 1 -m mark --mark 0x12000000 -j ACCEPT -m comment --comment "egress gateway: no snat"

CNI Compatibility

Calico

Required settings chainInsertMode to Append, for example in the code, more reference calico docs:

apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
  name: default
spec:
  ipv6Support: false
  ipipMTU: 1400
  chainInsertMode: Append

Implementation

Controller

Controller consists of Webhook Validator and Reconcile Flow.

Controller has 2 control processes, the first Watch cluster nodes, generate tunnel IP address and MAC address for Node, then Create or Update EgressNode CR Status. The second control flow watch EgressNode and Egressgateway, sync match node list from labelSelector, election egress gateway node.

Agent

Agent has two control processes, the first Watch EgressNode CR, which manages node tunnel, and node tunnel is a pluggable interface that can be replaced by Geneve. The second control process manages datapath policy, which watches EgressNode, EgressGateway and Egresspolicy, and sends them to the host through the police interface. It is currently implemented by a combination of ipset, iptables, and route, and it can be replaced by eBPF.

Go Package (Structure) Design

├── api
│   └── v1
├── charts
├── cmd
│   ├── agent
│   │   ├── cmd
│   │   │   └── root.go
│   │   └── main.go
│   └── controller
│       ├── cmd
│       │   └── root.go
│       └── main.go
├── docs
├── images
├── output
├── pkg
│   ├── config
│   │   └── config.go
│   ├── agent
│   │   ├── agent.go
│   │   ├── egress_gateway_node.go
│   │   ├── egress_node.go
│   │   ├── egress_police.go
│   │   ├── iptables
│   │   │   └── iptables.go
│   │   ├── route
│   │   │   └── route.go
│   │   └── vxlan
│   │       └── vxlan.go
│   ├── controller
│   │   ├── allocator
│   │   │   └── interface.go
│   │   ├── controller.go
│   │   ├── controller_test.go
│   │   ├── egress_gateway_node.go
│   │   ├── node.go
│   │   └── webhook
│   │       ├── mutating.go
│   │       └── validate.go
│   ├── ipset
│   │   ├── ipset.go
│   │   └── types.go
│   ├── k8s
│   ├── lock
│   ├── logger
│   ├── metrics
│   ├── profiling
│   ├── schema
│   └── types
├── test
├── tools
└── vendor

develop

Refer to develop.