[求助/Help]probe-isolated-devices:Send error request process timeout
Closed this issue · 3 comments
chenjacken commented
Version:3.11.8
OS:Centos 7.9
Error Info:
kubectl logs default-host-qksqz -n onecloud -c host --tail 100 -f
info 2024-12-04 22:34:58 isolated_device.getPassthroughGPUs(gpu.go:86)] filter address [], enableWhiteList: false
[warning 2024-12-04 22:35:04 isolated_device.NewPCIDevice2(gpu.go:241)] fillPCIEInfo for line: "00:16.0 \"Communication controller [0780]\" \"Intel Corporation [8086]\" \"C620 Series Chipset Family MEI Controller #1 [a1ba]\" -r09 \"ASUSTeK Computer Inc. [1043]\" \"Device [871e]\"", device: {}, error: device address is empty: {}
[info 2024-12-04 22:35:04 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[warning 2024-12-04 22:35:04 isolated_device.NewPCIDevice2(gpu.go:241)] fillPCIEInfo for line: "00:16.1 \"Communication controller [0780]\" \"Intel Corporation [8086]\" \"C620 Series Chipset Family MEI Controller #2 [a1bb]\" -r09 \"ASUSTeK Computer Inc. [1043]\" \"Device [871e]\"", device: {}, error: device address is empty: {}
[info 2024-12-04 22:35:04 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[warning 2024-12-04 22:35:04 isolated_device.NewPCIDevice2(gpu.go:241)] fillPCIEInfo for line: "00:16.4 \"Communication controller [0780]\" \"Intel Corporation [8086]\" \"C620 Series Chipset Family MEI Controller #3 [a1be]\" -r09 \"ASUSTeK Computer Inc. [1043]\" \"Device [871e]\"", device: {}, error: device address is empty: {}
[info 2024-12-04 22:35:04 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[warning 2024-12-04 22:35:05 isolated_device.NewPCIDevice2(gpu.go:241)] fillPCIEInfo for line: "00:1c.0 \"PCI bridge [0604]\" \"Intel Corporation [8086]\" \"C620 Series Chipset Family PCI Express Root Port #1 [a190]\" -rf9 \"\" \"\"", device: {}, error: device address is empty: {}
[info 2024-12-04 22:35:05 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[warning 2024-12-04 22:35:05 isolated_device.NewPCIDevice2(gpu.go:241)] fillPCIEInfo for line: "00:1c.3 \"PCI bridge [0604]\" \"Intel Corporation [8086]\" \"C620 Series Chipset Family PCI Express Root Port #4 [a193]\" -rf9 \"\" \"\"", device: {}, error: device address is empty: {}
[info 2024-12-04 22:35:05 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[info 2024-12-04 22:35:07 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address 02:00.0 is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[info 2024-12-04 22:35:07 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address 03:00.0 is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[info 2024-12-04 22:35:37 isolated_device.(*PCIDevice).forceBindVFIOPCIDriver(gpu.go:428)] {"bus_id":"1d:00.0","class_code":"0300","class_name":"VGA compatible controller","device_id":"2684","device_name":"Device","pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"subdevice_id":"167c","subdevice_name":"Device","subvendor_id":"10de","subvendor_name":"NVIDIA Corporation","vendor_id":"10de","vendor_name":"NVIDIA Corporation"} already use vfio-pci driver
[info 2024-12-04 22:35:38 isolated_device.(*PCIDevice).forceBindVFIOPCIDriver(gpu.go:428)] {"bus_id":"20:00.0","class_code":"0300","class_name":"VGA compatible controller","device_id":"2684","device_name":"Device","pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"subdevice_id":"167c","subdevice_name":"Device","subvendor_id":"10de","subvendor_name":"NVIDIA Corporation","vendor_id":"10de","vendor_name":"NVIDIA Corporation"} already use vfio-pci driver
[info 2024-12-04 22:35:39 isolated_device.(*PCIDevice).forceBindVFIOPCIDriver(gpu.go:428)] {"bus_id":"21:00.0","class_code":"0300","class_name":"VGA compatible controller","device_id":"2684","device_name":"Device","pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"subdevice_id":"167c","subdevice_name":"Device","subvendor_id":"10de","subvendor_name":"NVIDIA Corporation","vendor_id":"10de","vendor_name":"NVIDIA Corporation"} already use vfio-pci driver
[warning 2024-12-04 22:35:41 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager HttpRequestWorkerManager has been busy for 2 cycles...
[info 2024-12-04 22:35:41 isolated_device.(*PCIDevice).forceBindVFIOPCIDriver(gpu.go:428)] {"bus_id":"24:00.0","class_code":"0300","class_name":"VGA compatible controller","device_id":"2684","device_name":"Device","pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"subdevice_id":"167c","subdevice_name":"Device","subvendor_id":"10de","subvendor_name":"NVIDIA Corporation","vendor_id":"10de","vendor_name":"NVIDIA Corporation"} already use vfio-pci driver
[warning 2024-12-04 22:35:58 appsrv.(*SWorker).Detach(workers.go:125)] detach worker #24(0xc001f41890, detach) POST /hosts/32bc16ab-0e62-422c-8a79-2b9bcfe27094/probe-isolated-devices(POST /hosts/32bc16ab-0e62-422c-8a79-2b9bcfe27094/probe-isolated-devices) due to reason timeout after 1m0.000226909s
[error 2024-12-04 22:35:58 httperrors.HTTPError(httperrors.go:110)] Send error request process timeout
goroutine 417628 [running]:
runtime/debug.Stack()
/usr/lib/go/src/runtime/debug/stack.go:24 +0x5e
runtime/debug.PrintStack()
/usr/lib/go/src/runtime/debug/stack.go:16 +0x13
yunion.io/x/onecloud/pkg/httperrors.HTTPError({0x3b72be8?, 0xc001f41a10?}, {0x3b62250?, 0xc001f41710?}, {0x36307e7, 0x17}, 0x1f8, {0x35ece4a, 0xc}, {{0x36307e7, ...}, ...})
/root/go/src/yunion.io/x/onecloud/pkg/httperrors/httperrors.go:112 +0x3e5
yunion.io/x/onecloud/pkg/httperrors.JsonClientError(...)
/root/go/src/yunion.io/x/onecloud/pkg/httperrors/httperrors.go:117
yunion.io/x/onecloud/pkg/httperrors.GeneralServerError({0x3b72be8, 0xc001f41a10}, {0x3b62250, 0xc001f41710}, {0x3b50a80?, 0xc000e85ef0?})
/root/go/src/yunion.io/x/onecloud/pkg/httperrors/httperrors.go:122 +0xcd
yunion.io/x/onecloud/pkg/appsrv.(*Application).defaultHandle(0xc000944000, {0x3b62250?, 0xc001f41710}, 0xc000b36b00, {0xc00089ec78, 0x14})
/root/go/src/yunion.io/x/onecloud/pkg/appsrv/appsrv.go:425 +0xcb6
yunion.io/x/onecloud/pkg/appsrv.(*Application).ServeHTTP(0xc000944000, {0x3b62940, 0xc001af67e0}, 0xc000b36b00)
/root/go/src/yunion.io/x/onecloud/pkg/appsrv/appsrv.go:258 +0x20b
net/http.serverHandler.ServeHTTP({0x3b5a758?}, {0x3b62940?, 0xc001af67e0?}, 0x6?)
/usr/lib/go/src/net/http/server.go:2938 +0x8e
net/http.(*conn).serve(0xc0015986c0, {0x3b72be8, 0xc000fdcc00})
/usr/lib/go/src/net/http/server.go:2009 +0x5f4
created by net/http.(*Server).Serve in goroutine 1
/usr/lib/go/src/net/http/server.go:3086 +0x5cb
[info 2024-12-04 22:35:58 appsrv.(*Application).ServeHTTP(appsrv.go:289)] eRQl5K-V0hyE4u5NgID8BcyaLww= 504 926596-f3b957-34b980 POST /hosts/32bc16ab-0e62-422c-8a79-2b9bcfe27094/probe-isolated-devices (172.16.0.13:39552:compute_v2) 60000.59ms
[warning 2024-12-04 22:36:11 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager HttpRequestWorkerManager has been busy for 3 cycles...
[warning 2024-12-04 22:36:41 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager HttpRequestWorkerManager has been busy for 4 cycles...
[info 2024-12-04 22:36:43 isolated_device.(*isolatedDeviceManager).probeGPUS(isolated_device.go:167)] Add GPU device: 0 => &isolated_device.PCIDevice{Addr:"1d:00.0", ClassName:"VGA compatible controller", ClassCode:"0300", VendorName:"NVIDIA Corporation", VendorId:"10de", DeviceName:"Device", DeviceId:"2684", SubvendorName:"NVIDIA Corporation", SubvendorId:"10de", SubdeviceName:"Device", SubdeviceId:"167c", ModelName:"", RestIOMMUGroupDevs:[]*isolated_device.PCIDevice{(*isolated_device.PCIDevice)(0xc000249260)}, PCIEInfo:(*compute.IsolatedDevicePCIEInfo)(0xc00198c1c0)}
[info 2024-12-04 22:36:43 isolated_device.(*isolatedDeviceManager).probeGPUS(isolated_device.go:167)] Add GPU device: 1 => &isolated_device.PCIDevice{Addr:"20:00.0", ClassName:"VGA compatible controller", ClassCode:"0300", VendorName:"NVIDIA Corporation", VendorId:"10de", DeviceName:"Device", DeviceId:"2684", SubvendorName:"NVIDIA Corporation", SubvendorId:"10de", SubdeviceName:"Device", SubdeviceId:"167c", ModelName:"", RestIOMMUGroupDevs:[]*isolated_device.PCIDevice{(*isolated_device.PCIDevice)(0xc000852460)}, PCIEInfo:(*compute.IsolatedDevicePCIEInfo)(0xc001c913c0)}
[info 2024-12-04 22:36:43 isolated_device.(*isolatedDeviceManager).probeGPUS(isolated_device.go:167)] Add GPU device: 2 => &isolated_device.PCIDevice{Addr:"21:00.0", ClassName:"VGA compatible controller", ClassCode:"0300", VendorName:"NVIDIA Corporation", VendorId:"10de", DeviceName:"Device", DeviceId:"2684", SubvendorName:"NVIDIA Corporation", SubvendorId:"10de", SubdeviceName:"Device", SubdeviceId:"167c", ModelName:"", RestIOMMUGroupDevs:[]*isolated_device.PCIDevice{(*isolated_device.PCIDevice)(0xc002b507e0)}, PCIEInfo:(*compute.IsolatedDevicePCIEInfo)(0xc0023b8580)}
[info 2024-12-04 22:36:43 isolated_device.(*isolatedDeviceManager).probeGPUS(isolated_device.go:167)] Add GPU device: 3 => &isolated_device.PCIDevice{Addr:"24:00.0", ClassName:"VGA compatible controller", ClassCode:"0300", VendorName:"NVIDIA Corporation", VendorId:"10de", DeviceName:"Device", DeviceId:"2684", SubvendorName:"NVIDIA Corporation", SubvendorId:"10de", SubdeviceName:"Device", SubdeviceId:"167c", ModelName:"", RestIOMMUGroupDevs:[]*isolated_device.PCIDevice{(*isolated_device.PCIDevice)(0xc00299a700)}, PCIEInfo:(*compute.IsolatedDevicePCIEInfo)(0xc0028b9140)}
[info 2024-12-04 22:36:46 isolated_device.(*PCIDevice).forceBindVFIOPCIDriver(gpu.go:428)] {"bus_id":"24:00.0","class_code":"0300","class_name":"VGA compatible controller","device_id":"2684","device_name":"Device","pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"subdevice_id":"167c","subdevice_name":"Device","subvendor_id":"10de","subvendor_name":"NVIDIA Corporation","vendor_id":"10de","vendor_name":"NVIDIA Corporation"} already use vfio-pci driver
[info 2024-12-04 22:36:46 isolated_device.SyncDeviceInfo(isolated_device.go:478)] Update 7d2e69e1-e19c-4e0e-8d4c-01f61f9ac443 isolated_device: {"addr":"24:00.0","detected_on_host":true,"dev_type":"GPU-HPC","host_id":"32bc16ab-0e62-422c-8a79-2b9bcfe27094","id":"7d2e69e1-e19c-4e0e-8d4c-01f61f9ac443","model":"Device","numa_node":0,"pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"vendor_device_id":"10de:2684"}
[info 2024-12-04 22:36:46 isolated_device.(*PCIDevice).forceBindVFIOPCIDriver(gpu.go:428)] {"bus_id":"21:00.0","class_code":"0300","class_name":"VGA compatible controller","device_id":"2684","device_name":"Device","pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"subdevice_id":"167c","subdevice_name":"Device","subvendor_id":"10de","subvendor_name":"NVIDIA Corporation","vendor_id":"10de","vendor_name":"NVIDIA Corporation"} already use vfio-pci driver
[info 2024-12-04 22:36:46 isolated_device.SyncDeviceInfo(isolated_device.go:478)] Update 7c119893-5082-43e9-80e5-0332923fe051 isolated_device: {"addr":"21:00.0","detected_on_host":true,"dev_type":"GPU-HPC","host_id":"32bc16ab-0e62-422c-8a79-2b9bcfe27094","id":"7c119893-5082-43e9-80e5-0332923fe051","model":"Device","numa_node":0,"pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"vendor_device_id":"10de:2684"}
[info 2024-12-04 22:36:46 isolated_device.(*PCIDevice).forceBindVFIOPCIDriver(gpu.go:428)] {"bus_id":"20:00.0","class_code":"0300","class_name":"VGA compatible controller","device_id":"2684","device_name":"Device","pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"subdevice_id":"167c","subdevice_name":"Device","subvendor_id":"10de","subvendor_name":"NVIDIA Corporation","vendor_id":"10de","vendor_name":"NVIDIA Corporation"} already use vfio-pci driver
[info 2024-12-04 22:36:46 isolated_device.SyncDeviceInfo(isolated_device.go:478)] Update eb381ab8-efbf-419d-8a98-02221ab172b8 isolated_device: {"addr":"20:00.0","detected_on_host":true,"dev_type":"GPU-HPC","host_id":"32bc16ab-0e62-422c-8a79-2b9bcfe27094","id":"eb381ab8-efbf-419d-8a98-02221ab172b8","model":"Device","numa_node":0,"pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"vendor_device_id":"10de:2684"}
[info 2024-12-04 22:36:46 isolated_device.(*PCIDevice).forceBindVFIOPCIDriver(gpu.go:428)] {"bus_id":"1d:00.0","class_code":"0300","class_name":"VGA compatible controller","device_id":"2684","device_name":"Device","pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"subdevice_id":"167c","subdevice_name":"Device","subvendor_id":"10de","subvendor_name":"NVIDIA Corporation","vendor_id":"10de","vendor_name":"NVIDIA Corporation"} already use vfio-pci driver
[info 2024-12-04 22:36:46 isolated_device.SyncDeviceInfo(isolated_device.go:478)] Update f5b40d5a-acc5-4b60-8445-49850afb6b9e isolated_device: {"addr":"1d:00.0","detected_on_host":true,"dev_type":"GPU-HPC","host_id":"32bc16ab-0e62-422c-8a79-2b9bcfe27094","id":"f5b40d5a-acc5-4b60-8445-49850afb6b9e","model":"Device","numa_node":0,"pcie_info":{"lane_width":16,"throughput":"31.50 GB/s","transfer_rate_per_lane":"16GT/s","version":"4.0"},"vendor_device_id":"10de:2684"}
Thanks!!
wanyaoqi commented
@chenjacken 每次访问这个页面的时候会重新探测宿主机的透传设备。这个接口是可能会超时,我们优化一下
chenjacken commented
@chenjacken 每次访问这个页面的时候会重新探测宿主机的透传设备。这个接口是可能会超时,我们优化一下
嗯嗯,好的好的,谢谢!!