/kne

Primary LanguageGoApache License 2.0Apache-2.0

Kubernetes Network Emulation

Actions Status Go Report Card GoDoc License: BSD GitHub Super-Linter Coverage Status

This is not an officially supported Google product.

Goal

For network emulation, there are many approaches using VM's for emulation of a hardware router. Arista, Cisco, Juniper, Drivenets, and Nokia have multiple implementations of their network operating system and various generations of hardware emulation. These systems are very good for most validation of vendor control plane implementations and data plane for limited certifications. The idea of this project is to provide a standard "interface" so that vendors can produce a standard container implementation which can be used to build complex topologies.

  • Have standard lifecycle management infrastructure for allowing multiple vendor device emulations to be present in a single "topology"
  • Allow for control plane access via standard k8s networking
  • Provide a common networking interface for the forwarding plane between network pods.
    • Data plane wires between pods
    • Control plane wires between topology manager
  • Define service implementation for allowing interaction with the topology manager service.
    • Topology manager is the public API for allowing external users to manipulate the link state in the topology.
    • The topology manager will run as a service in k8s environment.
    • It will provide a gRPC interface for tests to interact with
    • It will listen to CRDs published via the network device pods for discovery
  • Data plane connections for connectivity between pods must be a public transport mechanism
    • This can't be implemented as just exposing "x eth devices on the pod" because Linux doesn't understand the associated control messages which are needed to make this work like a wire.
    • Transceiver state, optical characteristics, wire state, packet filtering / shaping / drops
    • LACP or other port aggregation protocols or APS cannot be simulated correctly
    • The topology manager will start a topology agent on each host for the pod to directly interact with.
    • The topology agent will provide the connectivity between nodes
  • Define how pods boot an initial configuration
    • Ideally, this method would allow for dynamic
  • Define how pods express services for use in-cluster as well as external services

Use Cases

Test Development

The main use case of this infrastructure is for the development of tests to validate control plane / configuration of network devices without needing real hardware.

The main use case we are interested in is the ability to bring up arbitrary topologies to represent a production topology. This would require multiple vendors as well as traffic generation and end hosts.

In support of the testing we need to be able to provide every tester, engineer and continuous automated run a set of environments to validate test scenarios used in production. These can also be used to pre-validate hardware testing as well. This can reduce cycle time as there will be no contention for the virtual testbed vs. the hardware testbed. This also allows for "unit testing" the integration test.

Software Development

For the development of new services or for offering a better environment to developers for existing services, virtual testbeds would allow for better scaling of resources and easier to use testbeds that would be customized for a team's needs. Specifically, workflow automation struggles to have physical representations of metros that need to be validated for workflows. A virtual testbed would allow for the majority of workflows to be validated against any number of production topologies.

Usage

See the collection of docs for in depth guides on how use Kubernetes Network Emulation (KNE).

Disclaimers

Usage Metrics Reporting

The KNE CLI optionally collects anonymous usage metrics. This is turned OFF by default. We use the metrics to gauge the health and performance of various KNE operations (i.e. cluster deployment, topology creation) on an opt-in basis. There is a global flag --report_usage that when provided shares anonymous details about certain KNE CLI commands. Collected data can be seen in the event proto definition. Usage metrics are NOT shared by default. Additionally the PubSub project and topic the events are published to are configurable. If you want to track your own private metrics about your KNE usage then that is supported by providing a Cloud PubSub project/topic of your choosing. Full details about how/when usage events are published can be found in the codebase here. We appreciate usage metric reporting as it helps us develop a better KNE experience for all of our users. Whether that be detecting an abnormally high number of cluster deployment failures due to an upgrade to an underlying dependency introduced by a new commit, or detecting a bug from a scenario where the failure rate for topologies over n links is far greater than n-1 links. Usage metric reporting is helpful tool for the KNE developers.

Thanks

This project is mainly based on the k8s-topo from github.com/networkop/k8s-topo and meshnet-cni plugin from github.com/networkop/meshnet-cni.