The IP version looks too old to compile on Alveo U45N/Alveo SN1022?
PriceHuang opened this issue · 7 comments
Hi zhguanw,
I am trying to put RecoNIC on Alveo U45N/SN1022. But RecoNIC's building tcls are using the older version IP, such as P4 are using v1.0 and ernic are using v3.1. While ernic v3.1cannot work on U45N's part(xcu26-vsva1365-2VL-e). That will causing some problem.
FYI, I had put Open-nic on U45N, and working good.
Hi @PriceHuang,
Thanks for your interest.
May I know which Vivado version are you using for U45N and SN1022?
Upgrading VitisNetP4 should be very simple. Upgrading ERNIC from v3.1 to v4.0 requires some efforts, as its register space has been changed. We don't have any plans to migrate ERNIC from v3.1 to v4.0 this year.
If you're willing to migrate ERNIC from v3.1 to v4.0, we are happy to guide you. Otherwise, please keep tuned.
Best regards,
Guanwen
Thanks for your reply! I am using vivado 2023.1 before and I changed to vivado 2021.2 now. The problems on IP version are solved. Now the problem is driver cannot work with the project. After insmod the driver I still cannot find the device by ifconfig.
Can see the error code "onic_pci_probe : onic_enable_cmac () failed with -16"
Best regards,
PriceHuang
Hi @PriceHuang,
Now the problem is driver cannot work with the project. After insmod the driver I still cannot find the device by ifconfig.
Can see the error code "onic_pci_probe : onic_enable_cmac () failed with -16"
"onic_enable_cmac() failed with -16" means the CMAC component is not reset properly. Are you using the same version Ubuntu and linux kernel mentioned in the repo? We only tested on Ubuntu 20.04 with linux kernel version 5.4.0-125-generic.
BTW, in your current project, are you using U45N FPGA board, instead of U250?
Best regards,
Guanwen
Hi @PriceHuang,
Now the problem is driver cannot work with the project. After insmod the driver I still cannot find the device by ifconfig.
Can see the error code "onic_pci_probe : onic_enable_cmac () failed with -16""onic_enable_cmac() failed with -16" means the CMAC component is not reset properly. Are you using the same version Ubuntu and linux kernel mentioned in the repo? We only tested on Ubuntu 20.04 with linux kernel version 5.4.0-125-generic.
BTW, in your current project, are you using U45N FPGA board, instead of U250?
Best regards, Guanwen
Yes, I do run on Ubuntu-20.04 but linux kernel is 5.15.0-84-generic, and the board I am using is U45N.
I just solve the problem by annotating the code in "onic_main.c" line 1084 to line 1086.
And now I caught a new problem, the test case in rdma_test will fall in QP2 in FATAL problem after I try to test 128K Byte payload in SEND_RECV test.
Best regards,
PriceHuang
Hi @PriceHuang ,
Good to hear that you solved the problem.
>And now I caught a new problem, the test case in rdma_test will fall in QP2 in FATAL problem after I try to test 128K Byte payload in SEND_RECV test.
The issue is caused by "RQE_SIZE" (https://github.com/Xilinx/RecoNIC/blob/main/lib/rdma_api.h#L24), which is 256*256 = 64KB per RQ size. You can simply set it to 512 to bypass the issue. I'll update a version soon to make the rqe_size configurable.
Thanks,
Guanwen
Hi @PriceHuang ,
Good to hear that you solved the problem.
>And now I caught a new problem, the test case in rdma_test will fall in QP2 in FATAL problem after I try to test 128K Byte payload in SEND_RECV test.
The issue is caused by "RQE_SIZE" (https://github.com/Xilinx/RecoNIC/blob/main/lib/rdma_api.h#L24), which is 256*256 = 64KB per RQ size. You can simply set it to 512 to bypass the issue. I'll update a version soon to make the rqe_size configurable.
Thanks, Guanwen
Hi @zhguanw-amd ,
In later time on that night, I just reboot the host to solve the problem.
I found the RecoNIC driver are running on userspace, maybe the problem is caused by memory overflow? For I cannot meet any "memory free" operation while can meet the "memory allocate"operation.
Thanks,
PriceHuang
Hi @PriceHuang ,
In send_recv.c, there are some buffers that we forgot to free, but it has nothing to do with QP fatal. If you check ./lib/*, you could find those free operations. QP fatal means some registers of ERNIC have wrong values. In your case, you are trying to send payload size more than a RQ buffer can accommodate.
We will update the send_recv example later. BTW, we'll also add hardware optimization gradually. Please keep tuned.
Thanks,
Guanwen