udev rule not running after loading fpga image
cmoore1776 opened this issue · 6 comments
Summary
The udev rule created by add_udev_rules.sh
does not match the device ID used after loading an fpga image.
The rule, which is deployed to /etc/udev/rules.d/9999-presistent-fpga.rules
, only matches on:
ATTR{device}=="0x1041"
ATTR{device}=="0x1042"
but it needs to also match on:
ATTR{device}=="0xf001"
Reproduction steps
- Launch an F1 instance on the latest AL2 FPGA Developer AMI
- Deploy the aws-fpga SDK
- Load an fpga image, e.g.
fpga-load-local-image -S 0 -I agfi-xxxxxSOMExIDxxxxx
- Note the permissions at
/sys/devices/pci0000:00/0000:00:1d.0/resource*
are 444
$ ls -lah /sys/devices/pci0000\:00/0000\:00\:1d.0/resource*
-r--r--r-- 1 root root 4.0K Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource
-r--r--r-- 1 root root 32M Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource0
-r--r--r-- 1 root root 2.0M Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource1
-r--r--r-- 1 root root 64K Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource2
-r--r--r-- 1 root root 64K Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource2_wc
-r--r--r-- 1 root root 128G Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource4
-r--r--r-- 1 root root 128G Apr 27 16:14 /sys/devices/pci0000:00/0000:00:1d.0/resource4_wc
Also note the device ID after loading the image:
$ sudo udevadm info -a -p /devices/pci0000:00/0000:00:1d.0 | grep "ATTR{device}"
ATTR{device}=="0xf001"
Fix
Add the following two lines to /etc/udev/rules.d/9999-presistent-fpga.rules
:
ATTR{vendor}=="0x1d0f", ATTR{device}=="0xf001", RUN+="/opt/aws/bin/change-fpga-perm.sh %k"
ATTR{vendor}=="0x1d0f", ATTR{device}=="0xf001", ACTION=="add", RUN+="/opt/aws/bin/change-fpga-perm.sh %k"
After loading an image, permissions are 666:
$ ls -lah /sys/devices/pci0000\:00/0000\:00\:1d.0/resourc*
-r--r--r-- 1 root root 4.0K May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource
-rw-rw-rw- 1 root root 32M May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource0
-rw-rw-rw- 1 root root 2.0M May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource1
-rw-rw-rw- 1 root root 64K May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource2
-rw-rw-rw- 1 root root 64K May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource2_wc
-rw-rw-rw- 1 root root 128G May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource4
-rw-rw-rw- 1 root root 128G May 18 14:34 /sys/devices/pci0000:00/0000:00:1d.0/resource4_wc
Thanks for reporting this.
For reproduction step 3
fpga-load-local-image -S 0 -I agfi-xxxxxSOMExIDxxxxx
Does the image loaded specify a device ID as per https://github.com/aws/aws-fpga/blob/4750aacb4dac9d464b099b27e4337220cf0b0713/hdk/cl/examples/cl_dram_dma_hlx/README.md#create-example-design-gui ?
set ::env(device_id) "0xF001"
set ::env(vendor_id) "0x1D0F"
set ::env(subsystem_id) "0x1D51"
set ::env(subsystem_vendor_id) "0xFEDC"
For example, the cl_dram_dma example is configured to use 0xf001
If so, what device_id is specified.
Does the image loaded specify a device ID as per https://github.com/aws/aws-fpga/blob/4750aacb4dac9d464b099b27e4337220cf0b0713/hdk/cl/examples/cl_dram_dma_hlx/README.md#create-example-design-gui ?
Yes, 0xf001
is based on using the device_id provided in the example.
I think I understand the issue, so let me rephrase.
When following the steps in the HOW TO, setting a device ID of "0xF001"
and then running the udev permission script, the PCIe device does not have the permissions properly applied.
Therefore
- The udev script should be corrected to include the default example device ID
- The documentation should be updated to note that when using a non-default device ID, the udev script should be patched by the user to enable non-root access to the FPGA device
Notes:
- For reproduction, add
export AWS_FPGA_ALLOW_NON_ROOT=y
to the setup step - For reproduction, add
export AWS_FPGA_SDK_OTHERS=y
to the setup step
Hello @shamelesscookie ,
I have been trying to reproduce the issue you described, along with the fix in PR #561 .
I haven't been able to reproduce the device permissions you list under step 4.
[centos@ip-172-31-83-184 ~]$ ls -lah /sys/devices/pci0000\:00/0000\:00\:1d.0/resource*
-r--r--r-- 1 root root 4.0K Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource
-rw------- 1 root root 32M Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource0
-rw------- 1 root root 2.0M Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource1
-rw------- 1 root root 64K Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource2
-rw------- 1 root root 64K Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource2_wc
-rw------- 1 root root 128G Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource4
-rw------- 1 root root 128G Jun 15 00:47 /sys/devices/pci0000:00/0000:00:1d.0/resource4_wc
[centos@ip-172-31-83-184 ~]$ sudo udevadm info -a -p /devices/pci0000:00/0000:00:1d.0 | grep "ATTR{device}"
ATTR{device}=="0xf001"
Are you using any environment variables that are not listed in your reproduction steps?
As a note, I have been using the public cl_dram_dma AGFI ( agfi-0b5c35827af676702
) with a PCI Device ID of 0xF001.
Hello!
Is there anything that AWS can help to resolve this issue? If the issue is resolved, we're curious to know the resolution.
Thank you!