tenstorrent/luwen

Possible false "outstanding PCIE DMA request" error

Opened this issue · 2 comments

Running the test eth_fw_data_check causes luwen to think that DMA keeps going so the chip hangs with the following error -
It is not currently safe to communicate with ARC because, there is an outstanding PCIE DMA request
Did some digging and looks like the register ARC_CSM.ARC_PCIE_DMA_REQUEST.trigger has a non 0 value, while if that same register is read using the old backend I see its value being 0

Reading from spi or running test-pcie-dma produces the same error. This was observed on 2 different WH systems, in both cases the chip was still accessible afterwards and passes test-pcie-dma.

I committed a fix in v0.4.5 that fixed the issue on the system that I managed to repro on. Please test it and let me know if you see any improvements.