hyperhq/runv

can I specify the runtime used for the container runv created behind the vm?

telala opened this issue · 15 comments

runv is an OCI compatible runtime and we use runv as the runtime when starting a runv container:
docker run --rm -it busybox sh

I want to know can we specify the runtime for the container behind the vm? such as nvidia docker runtime?
@gnawux @bergwolf

@telala runv itself is an OCI runtime that is at the same layer as docker runc and the nvidia docker runtime. They all can be specified through the docker run --runtime <oci-runtime> parameter, but cannot be specified at the same time because a container can only be backed by one OCI runtime.

Actually nvidia patched docker runc to include its own prestart-hooks in the nvidia docker runtime. It might be possible to integrate these hooks in runv (plus adding gpu passthrough support), but it cannot be done just through configuration, sorry.

@bergwolf runc and nvidia docker runtime are in the same level while using docker.
but runv and nvidia docker runtime are NOT in the same level. runv is used in host while nvidia docker is used in the container in guest.

if we separate the runv into two steps it will be more clear:

  1. create a kvm guest.
  2. create a container in the kvm guest, so the question should be can we specify a runtime for this container?

@bergwolf Is my understanding right?

@telala It is actually the job of hyperstart to create containers in a guest. There is no docker stack in the guest and thus no concept of a runtime there either.

On the host, there can be a docker software stack and runv is in the same position as runc and nvidia docker runtime.

@bergwolf I changed the runv code to passthrough a gpu . Now I can see the gpu device using lspci command in the container but there's no gpu node in /dev/* in container.
I think it is because there is no gpu driver in the guest.
But how to install the driver for the gpu?

I think the gpu driver should be installed in the guest.
But I do not know how to install it using hyperstart.
Can you give me some advice?
Thanks very much.

@telala do you mean the gpu kernel driver? You need to include it in the initrd image, by either building it in your guest kernel, or putting it in as a kernel module.

yes. the gpu kernel driver. Since the nvidia driver is close source. So I copy the nvidia.ko from a machine that already installed the gpu driver. And put the nvidia.ko into the modules.tar. Then generating the initrd.

Or put the nvidia.ko into the kernel and change the Makefile to include the nvidia then generating the kernel.

I am a little concern about the nvidia driver symbols.

Am I right?
@bergwolf

@telala No, I'm afraid not. You have to use the same kernel (in the guest) that the nvidia.ko was built upon. Mismatching kernel versions can either prevent you from loading the module, or cause unexpected kernel oops.

If you cannot build nvidia.ko on your own (since it's closed source), the only option is for you to get a kernel that works with the nvidia.ko and see if that can boot up the guest with runv.

@bergwolf but the guest with runv has to be created using hyperstart?
Can I use my own kernel and initrd with runv?

@telala Yes, you can use your own kernel, and re-create an initrd based on it. If you look at https://github.com/hyperhq/hyperstart/blob/master/build/make-initrd.sh, you can see how initrd is created. Replace with your kernel and modules.tar there, you can re-create the initrd image.

@bergwolf I tried to use the the kernel_config in arch/x86_64 to compile the 4.12.4 but the system refuse to run and no error messages output the system seems paused.

I tried to compare the kernel_config for runv and the config in our system. There are too much differences : (
So I want to know are there any kernel configs that runv must set?

@bergwolf following your advise I installed the gpu driver on a kernel and re-created the initrd based on that kernel(kernel and modules).
after the container started I use the command 'insmod nvidia.ko' to insert the nvidia module. From the dmesg I thought the nvidia driver is loaded successfully:
[ 119.230131] nvidia: loading out-of-tree module taints kernel.
[ 119.230756] nvidia: module license 'NVIDIA' taints kernel.
[ 119.231303] Disabling lock debugging due to kernel taint
[ 119.245305] nvidia-nvlink: Nvlink Core is being initialized, major device number 240
[ 119.276061] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 390.46 Fri Mar 16 22:24:50 PDT 2018

But there are no nvidiactl and nvidia0 nodes under /dev(these two directories should be created after nvidia driver loaded. I can see the two directories in my host machine).

Can you give me some advice for this problem? Where do you think I should insmod nvidia.ko?

@telala Is the above log from guest kernel? If so, my guess is that the gpu device is not properly passthrough to the guest. Care to send your gpu passthrough patch here so that we can review and merge it upstream?

@bergwolf yes the above log is from the guest kernel. To support passthrough I just add the
"-device", "vfio-pci,host=0000:08:00.0,id=gpu_0,bus=pci.0,addr=0xf" in amd_64.go.
using lspci in the guest I can see the gpu device with the bdf I specified above.
Is this because the mount namespace?

@telala no, devtmpfs is not mount namespace aware. Containers get the same view of devtmpfs as hyperstart. Can you mknod the device? The log only prints device major, can you find the device minor somewhere? It seems that there is still something wrong with device setup.

@bergwolf I opened a new issue to discuss the passthrough support in runv:
#680