Writing to / Reading of AxiRam from the hardware DUT hang when adding when loops to the test bench
Opened this issue · 0 comments
Cocotb version: 0.1.24
Hardware simulator: QuestaSim
Hello, I've encountered an issue while working on my project. I've instantiated an AxiRam in Cocotb to serve as the model RAM for the DUT (Device Under Test), enabling me to initialize its values from the Cocotb test bench. Initially, everything works well, especially with smaller-scale test cases. However, when attempting to test with a larger input scale, I encountered a problem.
To handle the larger input scale, I introduced a while loop in the test bench to wait for responses. Unfortunately, this seems to cause the write and read operations of the AxiRam to hang.
Due to the complexity of the DUT, I'm unable to provide the entire code. However, I can confirm that I haven't altered any inputs to the DUT itself. The only change made was the inclusion of a while loop to maintain the test bench's execution. I've observed the write and read operations hanging from the output console.
For reference, here's an example of the test bench, where the DUT completes a read operation from the AxiRam and do calculation on top and write data back in the end:
dut.weight_prefetcher_req_valid.value = 1 # enable the prefetcher
dut.weight_prefetcher_req.req_opcode.value = 0 # 00 is for weight bank requests
dut.weight_prefetcher_req.start_address.value = byte_per_weight_block * (i % weight_matrix_iteration) # start address of the weight bank
dut.weight_prefetcher_req.in_features.value = weight_matrix_size[1] # number of input features
dut.weight_prefetcher_req.out_features.value = weight_matrix_size[0] # number of output features
dut.weight_prefetcher_req.nodeslot.value = 0 # not used for weight bank requests
dut.weight_prefetcher_req.nodeslot_precision.value = 1 # 01 is for fixed 8-bit precision
dut.weight_prefetcher_req.neighbour_count.value = 0 # not used for weight bank requests
# --------------------------------------------------
dut.feature_prefetcher_req_valid.value = 1 # enable the prefetcher
dut.feature_prefetcher_req.req_opcode.value = 0 # 00 is for weight bank requests
dut.feature_prefetcher_req.start_address.value = weigth_address_range + byte_per_input_block * (i // weight_matrix_iteration) # start address of the feature bank
dut.feature_prefetcher_req.in_features.value = input_matrix_size[1] # number of input features
dut.feature_prefetcher_req.out_features.value = input_matrix_size[0] # number of output features
dut.feature_prefetcher_req.nodeslot.value = 0 # not used for weight bank requests
dut.feature_prefetcher_req.nodeslot_precision.value = 1 # 01 is for fixed 8-bit precision
dut.feature_prefetcher_req.neighbour_count.value = 0 # not used for weight bank requests
# --------------------------------------------------
dut.nsb_fte_req_valid.value = 1 # enable the fte
dut.nsb_fte_req.precision.value = 1 # 01 is for fixed 8-bit precision
dut.layer_config_out_channel_count.value = input_matrix_size[0] # here we used the first dimension of the input matrix as output channel count
dut.layer_config_out_features_count.value = weight_matrix_size[0] # here we used the first dimension of the weight matrix as output features count
dut.layer_config_out_features_address_msb_value.value = (writeback_address >> 32) & 0b11 # 2 is for the msb of 34 bits address
dut.layer_config_out_features_address_lsb_value.value = writeback_address & 0xFFFFFFFF # 0 for the rest of the address
dut.writeback_offset.value = offset # 0 for the writeback offset
#---------------------------------------------------
print("Done instructing fte")
i = 0
while True:
await RisingEdge(dut.clk)
await Timer(10, units="ns")
if dut.nsb_fte_resp_valid.value == 1:
done = True
break
if i==1000000:
done = False
break
i+=1
reset_fte(dut)
This test bench passed successfully, and all the reading and writing logs from the console appear to be correct. However, upon introducing a while loop as shown below:
dut.weight_prefetcher_req_valid.value = 1 # enable the prefetcher
dut.weight_prefetcher_req.req_opcode.value = 0 # 00 is for weight bank requests
dut.weight_prefetcher_req.start_address.value = byte_per_weight_block * (i % weight_matrix_iteration) # start address of the weight bank
dut.weight_prefetcher_req.in_features.value = weight_matrix_size[1] # number of input features
dut.weight_prefetcher_req.out_features.value = weight_matrix_size[0] # number of output features
dut.weight_prefetcher_req.nodeslot.value = 0 # not used for weight bank requests
dut.weight_prefetcher_req.nodeslot_precision.value = 1 # 01 is for fixed 8-bit precision
dut.weight_prefetcher_req.neighbour_count.value = 0 # not used for weight bank requests
# --------------------------------------------------
dut.feature_prefetcher_req_valid.value = 1 # enable the prefetcher
dut.feature_prefetcher_req.req_opcode.value = 0 # 00 is for weight bank requests
dut.feature_prefetcher_req.start_address.value = weigth_address_range + byte_per_input_block * (i // weight_matrix_iteration) # start address of the feature bank
dut.feature_prefetcher_req.in_features.value = input_matrix_size[1] # number of input features
dut.feature_prefetcher_req.out_features.value = input_matrix_size[0] # number of output features
dut.feature_prefetcher_req.nodeslot.value = 0 # not used for weight bank requests
dut.feature_prefetcher_req.nodeslot_precision.value = 1 # 01 is for fixed 8-bit precision
dut.feature_prefetcher_req.neighbour_count.value = 0 # not used for weight bank requests
# --------------------------------------------------
await Timer(10, units="ns")
p = 0
fetched_weight, fetched_input = False, False
while True:
await RisingEdge(dut.clk)
await Timer(10, units="ns")
if dut.weight_prefetcher_resp_valid.value == 1:
fetched_weight = True
if dut.feature_prefetcher_resp_valid.value == 1:
fetched_input = True
if fetched_weight and fetched_input:
break
elif p==1000000:
raise ValueError("Deadlock detected: weight_prefetcher_req_ready and feature_prefetcher_req_ready are not ready")
p+=1
reset_nsb_prefetcher(dut)
# --------------------------------------------------
dut.nsb_fte_req_valid.value = 1 # enable the fte
dut.nsb_fte_req.precision.value = 1 # 01 is for fixed 8-bit precision
dut.layer_config_out_channel_count.value = input_matrix_size[0] # here we used the first dimension of the input matrix as output channel count
dut.layer_config_out_features_count.value = weight_matrix_size[0] # here we used the first dimension of the weight matrix as output features count
dut.layer_config_out_features_address_msb_value.value = (writeback_address >> 32) & 0b11 # 2 is for the msb of 34 bits address
dut.layer_config_out_features_address_lsb_value.value = writeback_address & 0xFFFFFFFF # 0 for the rest of the address
dut.writeback_offset.value = offset # 0 for the writeback offset
#---------------------------------------------------
print("Done instructing fte")
i = 0
while True:
await RisingEdge(dut.clk)
await Timer(10, units="ns")
if dut.nsb_fte_resp_valid.value == 1:
done = True
break
if i==1000000:
done = False
break
i+=1
reset_fte(dut)
It appears that introducing the while loop has caused the read operation to hang, as depicted in the provided image.
Comparing it with the completed read and write operations, which occurred without the while loop, everything seems to function correctly, as shown in the second image.
For reference, this is how I connected AxiRam to my hardware:
**cocotb:**
self.axi_ram = AxiRam(AxiBus.from_prefix(dut, "axi"), dut.clk, dut.rst, size=2**34)
**system verilog:**
axi_interface axi_ram (
.clk (clk),
.rst (rst),
.axi_awid (c0_ddr4_s_axi_awid),
.axi_awaddr (c0_ddr4_s_axi_awaddr),
.axi_awlen (c0_ddr4_s_axi_awlen),
.axi_awsize (c0_ddr4_s_axi_awsize),
.axi_awburst (c0_ddr4_s_axi_awburst),
.axi_awlock (c0_ddr4_s_axi_awlock),
.axi_awcache (c0_ddr4_s_axi_awcache),
.axi_awprot (c0_ddr4_s_axi_awprot),
.axi_awqos (c0_ddr4_s_axi_awqos), // not used
.axi_awregion (), // not used
.axi_awvalid (c0_ddr4_s_axi_awvalid),
.axi_awready (c0_ddr4_s_axi_awready),
.axi_wdata (c0_ddr4_s_axi_wdata),
.axi_wstrb (c0_ddr4_s_axi_wstrb),
.axi_wlast (c0_ddr4_s_axi_wlast),
.axi_wvalid (c0_ddr4_s_axi_wvalid),
.axi_wready (c0_ddr4_s_axi_wready),
.axi_bid (c0_ddr4_s_axi_bid),
.axi_bresp (c0_ddr4_s_axi_bresp),
.axi_bvalid (c0_ddr4_s_axi_bvalid),
.axi_bready (c0_ddr4_s_axi_bready),
.axi_arid (c0_ddr4_s_axi_arid),
.axi_araddr (c0_ddr4_s_axi_araddr),
.axi_arlen (c0_ddr4_s_axi_arlen),
.axi_arsize (c0_ddr4_s_axi_arsize),
.axi_arburst (c0_ddr4_s_axi_arburst),
.axi_arlock (c0_ddr4_s_axi_arlock),
.axi_arcache (c0_ddr4_s_axi_arcache),
.axi_arprot (c0_ddr4_s_axi_arprot),
.axi_arqos (c0_ddr4_s_axi_arqos), // not used prefetcher_weight_bank_rm_axi_interconnect_axi_arqos
.axi_arregion (), // not used
.axi_arvalid (c0_ddr4_s_axi_arvalid),
.axi_arready (c0_ddr4_s_axi_arready),
.axi_rid (c0_ddr4_s_axi_rid),
.axi_rdata (c0_ddr4_s_axi_rdata),
.axi_rresp (c0_ddr4_s_axi_rresp),
.axi_rlast (c0_ddr4_s_axi_rlast),
.axi_rvalid (c0_ddr4_s_axi_rvalid),
.axi_rready (c0_ddr4_s_axi_rready)
);
May I ask if there is anyway to work around this? Thank you very very much for your help.