amd/xdna-driver

Failed to open KMQ device fd (err=22): Invalid argument TEST FAILED! when running the test example.

bittervan opened this issue · 15 comments

I tried to build the driver with these environments: ubuntu-22.04.04 + vitis-2022.2 and ubuntu-24.04 + vitis-2023.2. The kernel I cloned from AMD-SW/linux is 6.8.7+. The device I am using is Thinkbook 14+ with 8845hs. And the AMD VM support is enabled in the BIOS, but I cannot find options related with IOMMU.

After following the instructions in the readme, both the environment cannot run the test example with output:

$ ./example_build/example_noop_test ../tools/bins/1502_00/validate.xclbin
Host test code start...
Host test code is creating device object...
ERROR: Caught exception: Failed to open KMQ device fd (err=22): Invalid argument
TEST FAILED!

The dmesg command reports:

[  255.039799] amdxdna 0000:65:00.1: set mpnpu_clock = 600 mhz
[  255.059899] amdxdna 0000:65:00.1: set npu_hclock = 1024 mhz
[  255.101933] [drm] Initialized amdxdna_accel_driver 1.0.0 20240124 for 0000:65:00.1 on minor 0
[  278.109565] amdxdna 0000:65:00.1: amdxdna_drm_open: SVA bind device failed, ret -28

What is the possible cause of this problem?

Thanks for your attention in advance.

Please use the Linux kernel documented in the readme and try again.

Please use the Linux kernel documented in the readme and try again.

The kernel in the document is 6.8.5, but I clone the kernel using the command (same repo and same branch) in the document and it's 6.8.7+. Maybe it's not caused by the kernel version?

And the config from my local machine and the one downloaded as instructed in the document both don't work.

I also notice that this is caused by calling iommu_sva_bind_device in kernel. Does it have something todo with the Virtualization or iommu?

The reason we need to the special kernel is for the SVA support. The driver does not work if you don't have the proper kernel to support it.

The reason we need to the special kernel is for the SVA support. The driver does not work if you don't have the proper kernel to support it.

Thanks, I'll try to debug the kernel to see if the SVA is correctly enabled when following the instructions in the document.

The reason we need to the special kernel is for the SVA support. The driver does not work if you don't have the proper kernel to support it.

I think the IOMMU is correctly enabled since in the dmesg I can see

    0.596818] pci 0000:65:00.1: Adding to iommu group 4

where 65:00.1 is the AMD IPU(NPU). But the iommu_sva_bind_device in xdna-driver/src/driver/amdxdna/amdxdna_drv.c:62 keeps on returning -28, which denotes ENOSPC: No space left on device.

I found that the key problem is

iommu_sva_bind_device
	-> iommu_alloc_mm_data
		-> iommu_alloc_global_pasid
			dev->iommu->max_pasids is zero

And the max_pasids is always zero since iommu_init_device. I think the IOMMU on my laptop is not correctly enabled in the bios. On my laptop, the enable option of IOMMU is fixed to auto and cannot changed by users.

I am considering purchasing a new device. Could you please inform me the laptop you currently using? And if the driver can be used on new Ryzen AI desktop CPUs like Ryzen 8700g?

For driver development work, we don't use production laptop. I believe Ryzen 8700G works since I know someone has successfully run the driver on it.

Thanks a lot. I'll try the driver on another device.

My team has some Ryzen AI desktop CPUs and it works too.
We have various other laptops or mini-PC. I use in production a HP ZBook Power 15.6 inch G10 A Mobile Workstation PC.
It looks like the problem is often with all the various different BIOS in the wild which add some strange behavior sometimes. :-(
I have pushed the 6.8.8 kernel on https://github.com/AMD-SW/linux
There are a few AMD-related patches from upstream, it might help.

My team has some Ryzen AI desktop CPUs and it works too.
We have various other laptops or mini-PC. I use in production a HP ZBook Power 15.6 inch G10 A Mobile Workstation PC.
It looks like the problem is often with all the various different BIOS in the wild which add some strange behavior sometimes. :-(
I have pushed the 6.8.8 kernel on https://github.com/AMD-SW/linux
There are a few AMD-related patches from upstream, it might help.

Cool! I'll try it and provide feedbacks asap.

My team has some Ryzen AI desktop CPUs and it works too. We have various other laptops or mini-PC. I use in production a HP ZBook Power 15.6 inch G10 A Mobile Workstation PC. It looks like the problem is often with all the various different BIOS in the wild which add some strange behavior sometimes. :-( I have pushed the 6.8.8 kernel on https://github.com/AMD-SW/linux There are a few AMD-related patches from upstream, it might help.

Still doesn't work, same problem.

Still doesn't work, same problem.

Too bad. ☹️

Hi @bittervan , the upstream 6.10.0-rc* kernel has all the AMD IOMMU SVA change. Maybe you will want to try.

Hi @bittervan , the upstream 6.10.0-rc* kernel has all the AMD IOMMU SVA change. Maybe you will want to try.

Thanks, I'll check it out.

Hi @bittervan , the upstream 6.10.0-rc* kernel has all the AMD IOMMU SVA change. Maybe you will want to try.

Now it works perfectly!