kata-containers/agent

kata-agent get the wrong hot unplugged device info from s.pciDeviceMap in the getDeviceName function

flyflypeng opened this issue · 5 comments

Description of problem

This problem is found in a special case and reproduces probability is very small. I produce the bug by following steps:

  1. start a Pod with pause image
  2. create a container A in the Pod, however, kata-agent will create container A fail, this error occurs after addStorage in the CreateContainer function
  3. since step 2 create container A fail, k8s recreate a new container soon which marked as container B.But container B do virtioBlkStorageHandler function mount rootfs blk device fail, because container B get the wrong device name from the s.pciDeviceMap, which actually is the previous container A rootfs device name.

Why container B get the previous container A rootfs block device name which is hot unplugged already?

Because container A device hot unplug operation in kata-runtime and block device remove uevent received in the kata-agent is asynchronous.

The events timeline is:

  1. container A create fail in the kata-agent, kata-agent return the grpc error info back to kata-runtime
  2. kata-runtime get the grpc error from kata-agent and then call rollbackFailingContainerCreation to do rollback operations
  3. kata-runtime call block_del qmp command to hot unplug the container A rootfs block device from the guest and release the pciAddr.
  4. kata-runtime create a new container B and** reuse the same pciAddr as container A**, hotplug the container B rootfs block device into the guest, and createContainerRequest is sent to the kata-agent.
  5. kata-agent receive the create container B request then call getPCIDeviceName to get container B rootfs device name. However, getPCIDeviceName get the device name from s.pciDeviceMap before container A rootfs block device removed uevent received and container B rootfs block device added uevent, so getPCIDeviceName get the wrong device name of container A, because they have the same pci addr.
  6. listenToUdevEvents get the container A rootfs block device removed uevent and container B rootfs block device added uevent later, then delete the key-value store in the s.pciDeviceMap.

Log info:

2020-03-02T18:53:45.899246+08:00|info|kata-proxy[52178]|time="2020-03-02T18:53:45.898978139+08:00" level=info msg="time=\"2020-03-02T10:53:45.89874935Z\" level=info msg=\"Fetched PCI address for device\" completePCIAddr=\"0000:00:02.0/0000:01:03.0\" debug_console=false name=kata-agent pid=1 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent\n" name=kata-proxy pid=52178 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent
2020-03-02T18:53:45.899410+08:00|info|kata-proxy[52178]|time="2020-03-02T18:53:45.899106533+08:00" level=info msg="time=\"2020-03-02T10:53:45.89890689Z\" level=info msg=\"Device found in pci device map\" debug_console=false name=kata-agent pciID=02.0/03 pid=1 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent\n" name=kata-proxy pid=52178 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent
2020-03-02T18:53:45.899555+08:00|info|kata-proxy[52178]|time="2020-03-02T18:53:45.899163493+08:00" level=info msg="time=\"2020-03-02T10:53:45.89897652Z\" level=debug debug_console=false mount-destination=/run/kata-containers/shared/containers/cc138267fc121c9d53e9da75dbf7a39d6110c7f8938209f32820cb6dd1321a5e mount-flags=0 mount-fstype=ext4 mount-options= mount-source=/dev/vdc name=kata-agent pid=1 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent\n" name=kata-proxy pid=52178 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent
2020-03-02T18:53:45.900369+08:00|info|kata-proxy[52178]|time="2020-03-02T18:53:45.899258988+08:00" level=info msg="time=\"2020-03-02T10:53:45.899070343Z\" level=debug msg=\"request end\" debug_console=false duration=798.69393ms name=kata-agent pid=1 request=/grpc.AgentService/CreateContainer resp=\"&Empty{XXX_unrecognized:[],}\" sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent\n" name=kata-proxy pid=52178 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent
2020-03-02T18:53:45.900543+08:00|info|kata-proxy[52178]|time="2020-03-02T18:53:45.899423618+08:00" level=info msg="time=\"2020-03-02T10:53:45.899230309Z\" level=info msg=\"Received add uevent\" debug_console=false name=kata-agent pid=1 source=agent subsystem=udevlistener uevent-action=add uevent-devname= uevent-devpath=\"/devices/pci0000:00/0000:00:02.0/0000:01:03.0\" uevent-seqnum=961 uevent-subsystem=pci\n" name=kata-proxy pid=52178 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent
2020-03-02T18:53:45.900718+08:00|info|kata-proxy[52178]|time="2020-03-02T18:53:45.899475761+08:00" level=info msg="time=\"2020-03-02T10:53:45.899292219Z\" level=info msg=\"Received add uevent\" debug_console=false name=kata-agent pid=1 source=agent subsystem=udevlistener uevent-action=add uevent-devname= uevent-devpath=\"/devices/pci0000:00/0000:00:02.0/0000:01:03.0/virtio5\" uevent-seqnum=962 uevent-subsystem=virtio\n" name=kata-proxy pid=52178 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent
2020-03-02T18:53:45.900882+08:00|err|kata-runtime[12617]|time="2020-03-02T18:53:45.899495238+08:00" level=error msg="get error after send CreateContainerRequest: rpc error: code = Internal desc = Could not resolve symlink for source /dev/vdc" arch=amd64 command=create container=cc138267fc121c9d53e9da75dbf7a39d6110c7f8938209f32820cb6dd1321a5e name=kata-runtime pid=12617 source=virtcontainers subsystem=kata_agent
2020-03-02T18:53:45.901135+08:00|info|kata-proxy[52178]|time="2020-03-02T18:53:45.899521855+08:00" level=info msg="time=\"2020-03-02T10:53:45.899340474Z\" level=info msg=\"Received add uevent\" debug_console=false name=kata-agent pid=1 source=agent subsystem=udevlistener uevent-action=add uevent-devname= uevent-devpath=\"/devices/virtual/bdi/254:48\" uevent-seqnum=963 uevent-subsystem=bdi\n" name=kata-proxy pid=52178 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent
2020-03-02T18:53:45.901319+08:00|info|kata-proxy[52178]|time="2020-03-02T18:53:45.899587103+08:00" level=info msg="time=\"2020-03-02T10:53:45.899392298Z\" level=info msg=\"Received add uevent\" debug_console=false name=kata-agent pid=1 source=agent subsystem=udevlistener uevent-action=add uevent-devname=vdd uevent-devpath=\"/devices/pci0000:00/0000:00:02.0/0000:01:03.0/virtio5/block/vdd\" uevent-seqnum=964 uevent-subsystem=block\n" name=kata-proxy pid=52178 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent
2020-03-02T18:53:45.908475+08:00|info|kata-runtime[12617]|time="2020-03-02T18:53:45.908217138+08:00" level=info msg="unplugging block device" arch=amd64 command=create container=cc138267fc121c9d53e9da75dbf7a39d6110c7f8938209f32820cb6dd1321a5e name=kata-runtime pid=12617 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=virtcontainers subsystem=container
2020-03-02T18:53:45.908741+08:00|info|kata-runtime[12617]|time="2020-03-02T18:53:45.908257325+08:00" level=info msg="Unplugging block device" arch=amd64 command=create container=cc138267fc121c9d53e9da75dbf7a39d6110c7f8938209f32820cb6dd1321a5e device=/dev/dm-102 name=kata-runtime pid=12617 source=virtcontainers subsystem=device
2020-03-02T18:57:36.478937+08:00|info|kata-proxy[52178]|time="2020-03-02T18:57:36.476780292+08:00" level=info msg="time=\"2020-03-02T10:57:36.471873172Z\" level=info msg=\"Received add uevent\" debug_console=false name=kata-agent pid=1 source=agent subsystem=udevlistener uevent-action=add uevent-devname=vde uevent-devpath=\"/devices/pci0000:00/0000:00:02.0/0000:01:03.0/virtio5/block/vde\" uevent-seqnum=1000 uevent-subsystem=block\n" name=kata-proxy pid=52178 sandbox=1e5f43a202c2030439d2b6159561493c768dfbf56a96b8e51fc724a9b4bc24ac source=agent

Expected result

always create container in the Pod successfully

Actual result

create container in the Pod failed sometimes

@devimc @bergwolf I can't figure out a trick way to fix this problem for long time, so I want to have a discussion with your nice guys and look for some advice.

@flyflypeng it looks like you are facing a race condition, the only way that I can see to fix this is to let the kata-agent wait for the device (udev event) if and only if the device is listed in pciDeviceMap but it doesn't exist, please test the following patch and let me know if it works

diff --git a/device.go b/device.go
index ae95572..53ffac5 100644
--- a/device.go
+++ b/device.go
@@ -10,6 +10,7 @@ import (
        "context"
        "fmt"
        "io/ioutil"
+       "os"
        "path"
        "path/filepath"
        "strconv"
@@ -144,6 +145,17 @@ func getDeviceName(s *sandbox, devID string) (string, error) {
                }
        }
 
+       // Device found in s.pciDeviceMap, but is it still valid?
+       if devName != "" {
+               // Look for the device under the devtmpfs (/dev)
+               if _, err := os.Stat(filepath.Join(systemDevPath, devName)); err != nil {
+                       // device not found, remove it from the map and reset devName,
+                       // this way the agent waits for the udev event that adds such device.
+                       delete(s.pciDeviceMap, devID)
+                       devName = ""
+               }
+       }
+
        // If device is not found in the device map, hotplug event has not
        // been received yet, create and add channel to the watchers map.
        // The key of the watchers map is the device we are interested in.

@devimc Sorry for the later reply, I still not reproduce this problem from the test case time and time, I think I need to lookup another way to reproduce this problem. If you have time to join and reproduce this issue, it will be more wonderful.

Closing this issue since it doesn't reproduce.
@c3d can you point this out to David Gibson to see if it relates to similar problems he is seeing?

A few thoughts: