Containerized Data Importer

Containerized-Data-Importer (CDI) is a persistent storage management add-on for Kubernetes. It's primary goal is to provide a declarative way to build Virtual Machine Disks on PVCs for Kubevirt VMs

CDI works with standard core Kubernetes resources and is storage device agnostic, while its primary focus is to build disk images for Kubevirt, it's also useful outside of a Kubevirt context to use for initializing your Kubernetes Volumes with data.

Introduction

Kubernetes extension to populate PVCs with VM disk images or other data

CDI provides the ability to populate PVCs with VM images or other data upon creation. The data can come from different sources: a URL, a container registry, another PVC (clone), or an upload from a client.

DataVolumes

CDI includes a CustomResourceDefinition (CRD) that provides an object of type DataVolume. The DataVolume is an abstraction on top of the standard Kubernetes PVC and can be used to automate creation and population of a PVC with data. Although you can use PVCs directly with CDI, DataVolumes are the preferred method since they offer full functionality, a stable API, and better integration with kubevirt. More details about DataVolumes can be found here.

Import from URL

This method is selected when you create a DataVolume with an http source. CDI will populate the volume using a pod that will download from the given URL and handle the content according to the contentType setting (see below). It is possible to configure basic authentication using a secret and specify custom TLS certificates in a ConfigMap.

Import from container registry

When a DataVolume has a registry source CDI will populate the volume with a Container Disk downloaded from the given image URL. The only valid contentType for this source is kubevirt and the image must be a Container Disk. More details can be found here.

Clone another PVC

To clone a PVC, create a DataVolume with a pvc source and specify namespace and name of the source PVC. CDI will attempt an efficient clone of the PVC using the storage backend if possible. Otherwise, the data will be transferred to the target PVC using a TLS secured connection between two pods on the cluster network. More details can be found here.

Upload from a client

To upload data to a PVC from a client machine first create a DataVolume with an upload source. CDI will prepare to receive data via an upload proxy which will transit data from an authenticated client to a pod which will populate the PVC according to the contentType setting. To send data to the upload proxy you must have a valid UploadToken. See the upload documentation for details.

Prepare an empty Kubevirt VM disk

The special source blank can be used to populate a volume with an empty Kubevirt VM disk. This source is valid only with the kubevirt contentType. CDI will create a VM disk on the PVC which uses all of the available space. See here for an example.

Import from oVirt

Virtual machine disks can be imported from a running oVirt installation using the imageio source. CDI will use the provided credentials to securely transfer the indicated oVirt disk image so that it can be used with kubevirt. See here for more information and examples.

Import from VMware

Disks can be imported from VMware with the vddk source. CDI will transfer the disks using vCenter/ESX API credentials and a user-provided image containing the non-redistributable VDDK library. See here for instructions.

Content Types

CDI features specialized handling for two types of content: Kubevirt VM disk images and tar archives. The kubevirt content type indicates that the data being imported should be treated as a Kubevirt VM disk. CDI will automatically decompress and convert the file from qcow2 to raw format if needed. It will also resize the disk to use all available space. The archive content type indicates that the data is a tar archive. Compression is not yet supported for archives. CDI will extract the contents of the archive into the volume. The content type can be selected by specifying the contentType field in the DataVolume. kubevirt is the default content type. CDI only supports certain combinations of source and contentType as indicated below:

http → kubevirt, archive
registry → kubevirt
pvc → Not applicable - content is cloned
upload → kubevirt
imageio → kubevirt
vddk → kubevirt

Deploy it

Deploying the CDI controller is straightforward. In this document the default namespace is used, but in a production setup a protected namespace that is inaccessible to regular users should be used instead.

$ export VERSION=$(curl -s https://api.github.com/repos/kubevirt/containerized-data-importer/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
$ kubectl create -f https://github.com/kubevirt/containerized-data-importer/releases/download/$VERSION/cdi-operator.yaml
$ kubectl create -f https://github.com/kubevirt/containerized-data-importer/releases/download/$VERSION/cdi-cr.yaml

Use it

Create a DataVolume and populate it with data from an http source

$ kubectl create -f https://raw.githubusercontent.com/kubevirt/containerized-data-importer/$VERSION/manifests/example/import-kubevirt-datavolume.yaml

There are quite a few examples in the example manifests, check them out as a reference to create DataVolumes from additional sources like registries, S3 and your local system.

Hack it

CDI includes a self contained development and test environment. We use Docker to build, and we provide a simple way to get a test cluster up and running on your laptop. The development tools include a version of kubectl that you can use to communicate with the cluster. A wrapper script to communicate with the cluster can be invoked using ./cluster-up/kubectl.sh.

$ mkdir $GOPATH/src/kubevirt.io && cd $GOPATH/src/kubevirt.io
$ git clone https://github.com/kubevirt/containerized-data-importer && cd containerized-data-importer
$ make cluster-up
$ make cluster-sync
$ ./cluster-up/kubectl.sh .....

For development on external cluster (not provisioned by our CI), check out the external provider.

Storage notes

CDI is designed to be storage agnostic. Since it works with the kubernetes storage APIs it should work well with any configuration that can produce a Bound PVC. The following are storage-specific notes that may be relevant when using CDI.

NFSv3 is not supported: CDI uses qemu-img to manipulate disk images and this program uses locking which is not compatible with the obsolete NFSv3 protocol. We recommend using NFSv4.

Connect with us

We'd love to hear from you, reach out on Github via Issues or Pull Requests!

Hit us up on Slack

Shoot us an email at: kubevirt-dev@googlegroups.com

tahsinrahman/containerized-data-importer