/nri

Node Resource Interface

Primary LanguageGoApache License 2.0Apache-2.0

Node Resource Interface, Revisited

PkgGoDev Build Status codecov Go Report Card

This project is currently in DRAFT status

Goal

NRI allows plugging domain- or vendor-specific custom logic into OCI- compatible runtimes. This logic can make controlled changes to containers or perform extra actions outside the scope of OCI at certain points in a containers lifecycle. This can be used, for instance, for improved allocation and management of devices and other container resources.

NRI defines the interfaces and implements the common infrastructure for enabling such pluggable runtime extensions, NRI plugins. This also keeps the plugins themselves runtime-agnostic.

The goal is to enable NRI support in the most commonly used OCI runtimes, containerd and CRI-O.

Background

The revisited API is a major rewrite of NRI. It changes the scope of NRI and how it gets integrated into runtimes. It reworks how plugins are implemented, how they communicate with the runtime, and what kind of changes they can make to containers.

NRI v0.1.0 used an OCI hook-like one-shot plugin invocation mechanism where a separate instance of a plugin was spawned for every NRI event. This instance then used its standard input and output to receive a request and provide a response, both as JSON data.

Plugins in NRI are daemon-like entities. A single instance of a plugin is now responsible for handling the full stream of NRI events and requests. A unix-domain socket is used as the transport for communication. Instead of JSON requests and responses NRI is defined as a formal, protobuf-based 'NRI plugin protocol' which is compiled into ttRPC bindings. This should result in improved communication efficiency with lower per-message overhead, and enable straightforward implementation of stateful NRI plugins.

Components

The NRI implementation consists of a number of components. The core of these are essential for implementing working end-to-end NRI support in runtimes. These core components are the actual NRI protocol, and the NRI runtime adaptation.

Together these establish the model of how a runtime interacts with NRI and how plugins interact with containers in the runtime through NRI. They also define under which conditions plugins can make changes to containers and the extent of these changes.

The rest of the components are the NRI plugin stub library and some sample NRI plugins. Some plugins implement useful functionality in real world scenarios. A few others are useful for debugging. All of the sample plugins serve as practical examples of how the stub library can be used to implement NRI plugins.

Protocol, Plugin API

The core of NRI is defined by a protobuf protocol definition of the low-level plugin API. The API defines two services, Runtime and Plugin.

The Runtime service is the public interface runtimes expose for NRI plugins. All requests on this interface are initiated by the plugin. The interface provides functions for

  • initiating plugin registration
  • requesting unsolicited updates to containers

The Plugin service is the public interface NRI uses to interact with plugins. All requests on this interface are initiated by NRI/the runtime. The interface provides functions for

  • configuring the plugin
  • getting initial list of already existing pods and containers
  • hooking the plugin into pod/container lifecycle events
  • shutting down the plugin

Plugin Registration

Before a plugin can start receiving and processing container events, it needs to register itself with NRI. During registration the plugin and NRI perform a handshake sequence which consists of the following steps:

  1. the plugin identifies itself to the runtime
  2. NRI provides plugin-specific configuration data to the plugin
  3. the plugin subscribes to pod and container lifecycle events of interest
  4. NRI sends list of existing pods and containers to plugin
  5. the plugin requests any updates deemed necessary to existing containers

The plugin identifies itself to NRI by a plugin name and a plugin index. The plugin index is used by NRI to determine in which order the plugin is hooked into pod and container lifecycle event processing with respect to any other plugins.

The plugin name is used to pick plugin-specific data to send to the plugin as configuration. This data is only present if the plugin has been launched by NRI. If the plugin has been externally started it is expected to acquire its configuration also by external means. The plugin subscribes to pod and container lifecycle events of interest in its response to configuration.

As the last step in the registration and handshaking process, NRI sends the full set of pods and containers known to the runtime. The plugin can request updates it considers necessary to any of the known containers in response.

Once the handshake sequence is over and the plugin has registered with NRI, it will start receiving pod and container lifecycle events according to its subscription.

Pod Data and Available Lifecycle Events

NRI Pod Lifecycle Events

NRI plugins can subscribe to the following pod lifecycle events:

  • creation
  • stopping
  • removal

The following pieces of pod metadata are available to plugins in NRI:

  • ID
  • name
  • UID
  • namespace
  • labels
  • annotations
  • cgroup parent directory
  • runtime handler name

Container Data and Available Lifecycle Events

NRI Container Lifecycle Events

NRI plugins can subscribe to the following container lifecycle events:

  • creation (*)
  • post-creation
  • starting
  • post-start
  • updating (*)
  • post-update
  • stopping (*)
  • removal

*) Plugins can request adjustment or updates to containers in response to these events.

The following pieces of container metadata are available to plugins in NRI:

  • ID
  • pod ID
  • name
  • state
  • labels
  • annotations
  • command line arguments
  • environment variables
  • mounts
  • OCI hooks
  • rlimits
  • linux
    • namespace IDs
    • devices
    • resources
      • memory
        • limit
        • reservation
        • swap limit
        • kernel limit
        • kernel TCP limit
        • swappiness
        • OOM disabled flag
        • hierarchical accounting flag
        • hugepage limits
      • CPU
        • shares
        • quota
        • period
        • realtime runtime
        • realtime period
        • cpuset CPUs
        • cpuset memory
      • Block I/O class
      • RDT class

Apart from data identifying the container, these pieces of information represent the corresponding data in the container's OCI Spec.

Container Adjustment

During container creation plugins can request changes to the following container parameters:

  • annotations
  • mounts
  • environment variables
  • OCI hooks
  • rlimits
  • linux
    • devices
    • resources
      • memory
        • limit
        • reservation
        • swap limit
        • kernel limit
        • kernel TCP limit
        • swappiness
        • OOM disabled flag
        • hierarchical accounting flag
        • hugepage limits
      • CPU
        • shares
        • quota
        • period
        • realtime runtime
        • realtime period
        • cpuset CPUs
        • cpuset memory
      • Block I/O class
      • RDT class

Container Updates

Once a container has been created plugins can request updates to them. These updates can be requested in response to another containers creation request, in response to any containers update request, in response to any containers stop request, or they can be requested as part of a separate unsolicited container update request. The following container parameters can be updated this way:

  • resources
    • memory
      • limit
      • reservation
      • swap limit
      • kernel limit
      • kernel TCP limit
      • swappiness
      • OOM disabled flag
      • hierarchical accounting flag
      • hugepage limits
    • CPU
      • shares
      • quota
      • period
      • realtime runtime
      • realtime period
      • cpuset CPUs
      • cpuset memory
    • Block I/O class
    • RDT class

Runtime Adaptation

The NRI runtime adaptation package is the interface runtimes use to integrate to NRI and interact with NRI plugins. It implements basic plugin discovery, startup and configuration. It also provides the functions necessary to hook NRI plugins into lifecycle events of pods and containers from the runtime.

The package hides the fact that multiple NRI plugins might be processing any single pod or container lifecycle event. It takes care of invoking plugins in the correct order and combining responses by multiple plugins into a single one. While combining responses, the package detects any unintentional conflicting changes made by multiple plugins to a single container and flags such an event as an error to the runtime.

Wrapped OCI Spec Generator

The OCI Spec generator package wraps the corresponding package and adds functions for applying NRI container adjustments and updates to OCI Specs. This package can be used by runtime NRI integration code to apply NRI responses to containers.

Plugin Stub Library

The plugin stub hides many of the low-level details of implementing an NRI plugin. It takes care of connection establishment, plugin registration, configuration, and event subscription. All sample plugins are implemented using the stub. Any of these can be used as a tutorial on how the stub library should be used.

Sample Plugins

The following sample plugins exist for NRI:

Please see the documentation of these plugins for further details about what and how each of these plugins can be used for.

Security Considerations

From a security perspective NRI plugins should be considered part of the container runtime. NRI does not implement granular access control to the functionality it offers. Access to NRI is controlled by restricting access to the systemwide NRI socket. If a process can connect to the NRI socket and send data, it has access to the full scope of functionality available via NRI.

In particular this includes

  • injection of OCI hooks, which allow for arbitrary execution of processes with the same privilege level as the container runtime
  • arbitrary changes to mounts, including new bind-mounts, changes to the proc, sys, mqueue, shm, and tmpfs mounts
  • the addition or removal of arbitrary devices
  • arbitrary changes to the limits for memory, CPU, block I/O, and RDT resources available, including the ability to deny service by setting limits very low

The same precautions and principles apply to protecting the NRI socket as to protecting the socket of the runtime itself. Unless it already exists, NRI itself creates the directory to hold its socket with permissions that allow access only for the user ID of the runtime process. By default this limits NRI access to processes running as root (UID 0). Changing the default socket permissions is strongly advised against. Enabling more permissive access control to NRI should never be done without fully understanding the full implications and potential consequences to container security.

Plugins as Kubernetes DaemonSets

When the runtime manages pods and containers in a Kubernetes cluster, it is convenient to deploy and manage NRI plugins using Kubernetes DaemonSets. Among other things, this requires bind-mounting the NRI socket into the filesystem of a privileged container running the plugin. Similar precautions apply and the same care should be taken for protecting the NRI socket and NRI plugins as for the kubelet DeviceManager socket and Kubernetes Device Plugins.

The cluster configuration should make sure that unauthorized users cannot bind-mount host directories and create privileged containers which gain access to these sockets and can act as NRI or Device Plugins. See the related documentation and best practices about Kubernetes security.

API Stability

NRI APIs should not be considered stable yet. We try to avoid unnecessarily breaking APIs, especially the Stub API which plugins use to interact with NRI. However, before NRI reaches a stable 1.0.0 release, this is only best effort and cannot be guaranteed. Meanwhile we do our best to document any API breaking changes for each release in the release notes.

The current target for a stable v1 API through a 1.0.0 release is the end of this year.

Project details

nri is a containerd sub-project, licensed under the Apache 2.0 license. As a containerd sub-project, you will find the:

information in our containerd/project repository.