Writing to / Reading of AxiRam from the hardware DUT hang when adding when loops to the test bench

Question

Writing to / Reading of AxiRam from the hardware DUT hang when adding when loops to the test bench

Opened this issue 3 months ago · 0 comments

Cocotb version: 0.1.24
Hardware simulator: QuestaSim

Hello, I've encountered an issue while working on my project. I've instantiated an AxiRam in Cocotb to serve as the model RAM for the DUT (Device Under Test), enabling me to initialize its values from the Cocotb test bench. Initially, everything works well, especially with smaller-scale test cases. However, when attempting to test with a larger input scale, I encountered a problem.

To handle the larger input scale, I introduced a while loop in the test bench to wait for responses. Unfortunately, this seems to cause the write and read operations of the AxiRam to hang.

Due to the complexity of the DUT, I'm unable to provide the entire code. However, I can confirm that I haven't altered any inputs to the DUT itself. The only change made was the inclusion of a while loop to maintain the test bench's execution. I've observed the write and read operations hanging from the output console.

For reference, here's an example of the test bench, where the DUT completes a read operation from the AxiRam and do calculation on top and write data back in the end:

        dut.weight_prefetcher_req_valid.value = 1                               # enable the prefetcher
        dut.weight_prefetcher_req.req_opcode.value   = 0                        # 00 is for weight bank requests
        dut.weight_prefetcher_req.start_address.value = byte_per_weight_block * (i % weight_matrix_iteration)   # start address of the weight bank
        dut.weight_prefetcher_req.in_features.value  = weight_matrix_size[1]    # number of input features                     
        dut.weight_prefetcher_req.out_features.value = weight_matrix_size[0]    # number of output features
        dut.weight_prefetcher_req.nodeslot.value     = 0                        # not used for weight bank requests
        dut.weight_prefetcher_req.nodeslot_precision.value = 1                  # 01 is for fixed 8-bit precision
        dut.weight_prefetcher_req.neighbour_count.value = 0                     # not used for weight bank requests
        # --------------------------------------------------
        dut.feature_prefetcher_req_valid.value = 1                              # enable the prefetcher
        dut.feature_prefetcher_req.req_opcode.value   = 0                       # 00 is for weight bank requests
        dut.feature_prefetcher_req.start_address.value  = weigth_address_range + byte_per_input_block * (i // weight_matrix_iteration)   # start address of the feature bank
        dut.feature_prefetcher_req.in_features.value  = input_matrix_size[1]    # number of input features
        dut.feature_prefetcher_req.out_features.value = input_matrix_size[0]    # number of output features
        dut.feature_prefetcher_req.nodeslot.value     = 0                       # not used for weight bank requests
        dut.feature_prefetcher_req.nodeslot_precision.value = 1                 # 01 is for fixed 8-bit precision
        dut.feature_prefetcher_req.neighbour_count.value = 0                    # not used for weight bank requests
        # --------------------------------------------------
        dut.nsb_fte_req_valid.value = 1                                         # enable the fte
        dut.nsb_fte_req.precision.value = 1                                     # 01 is for fixed 8-bit precision
        dut.layer_config_out_channel_count.value = input_matrix_size[0]         # here we used the first dimension of the input matrix as output channel count
        dut.layer_config_out_features_count.value = weight_matrix_size[0]       # here we used the first dimension of the weight matrix as output features count       
        dut.layer_config_out_features_address_msb_value.value = (writeback_address >> 32) & 0b11        # 2 is for the msb of 34 bits address
        dut.layer_config_out_features_address_lsb_value.value = writeback_address & 0xFFFFFFFF          # 0 for the rest of the address
        dut.writeback_offset.value = offset                                     # 0 for the writeback offset
        #---------------------------------------------------
        print("Done instructing fte")
        i = 0
        while True:
            await RisingEdge(dut.clk)
            await Timer(10, units="ns")
            if dut.nsb_fte_resp_valid.value == 1:
                done = True
                break
            
            if i==1000000:
                done = False
                break
            i+=1
        reset_fte(dut)

This test bench passed successfully, and all the reading and writing logs from the console appear to be correct. However, upon introducing a while loop as shown below:

        dut.weight_prefetcher_req_valid.value = 1                               # enable the prefetcher
        dut.weight_prefetcher_req.req_opcode.value   = 0                        # 00 is for weight bank requests
        dut.weight_prefetcher_req.start_address.value = byte_per_weight_block * (i % weight_matrix_iteration)   # start address of the weight bank
        dut.weight_prefetcher_req.in_features.value  = weight_matrix_size[1]    # number of input features                     
        dut.weight_prefetcher_req.out_features.value = weight_matrix_size[0]    # number of output features
        dut.weight_prefetcher_req.nodeslot.value     = 0                        # not used for weight bank requests
        dut.weight_prefetcher_req.nodeslot_precision.value = 1                  # 01 is for fixed 8-bit precision
        dut.weight_prefetcher_req.neighbour_count.value = 0                     # not used for weight bank requests
        # --------------------------------------------------
        dut.feature_prefetcher_req_valid.value = 1                              # enable the prefetcher
        dut.feature_prefetcher_req.req_opcode.value   = 0                       # 00 is for weight bank requests
        dut.feature_prefetcher_req.start_address.value  = weigth_address_range + byte_per_input_block * (i // weight_matrix_iteration)   # start address of the feature bank
        dut.feature_prefetcher_req.in_features.value  = input_matrix_size[1]    # number of input features
        dut.feature_prefetcher_req.out_features.value = input_matrix_size[0]    # number of output features
        dut.feature_prefetcher_req.nodeslot.value     = 0                       # not used for weight bank requests
        dut.feature_prefetcher_req.nodeslot_precision.value = 1                 # 01 is for fixed 8-bit precision
        dut.feature_prefetcher_req.neighbour_count.value = 0                    # not used for weight bank requests
        # --------------------------------------------------
        await Timer(10, units="ns")
        p = 0
        fetched_weight, fetched_input = False, False
        while True:
            await RisingEdge(dut.clk)
            await Timer(10, units="ns")
            if dut.weight_prefetcher_resp_valid.value == 1:
                fetched_weight = True
            if dut.feature_prefetcher_resp_valid.value == 1:
                fetched_input = True
            if fetched_weight and fetched_input:
                break
            elif p==1000000:
                raise ValueError("Deadlock detected: weight_prefetcher_req_ready and feature_prefetcher_req_ready are not ready")
            p+=1
        reset_nsb_prefetcher(dut)
        # --------------------------------------------------
        dut.nsb_fte_req_valid.value = 1                                         # enable the fte
        dut.nsb_fte_req.precision.value = 1                                     # 01 is for fixed 8-bit precision
        dut.layer_config_out_channel_count.value = input_matrix_size[0]         # here we used the first dimension of the input matrix as output channel count
        dut.layer_config_out_features_count.value = weight_matrix_size[0]       # here we used the first dimension of the weight matrix as output features count       
        dut.layer_config_out_features_address_msb_value.value = (writeback_address >> 32) & 0b11        # 2 is for the msb of 34 bits address
        dut.layer_config_out_features_address_lsb_value.value = writeback_address & 0xFFFFFFFF          # 0 for the rest of the address
        dut.writeback_offset.value = offset                                     # 0 for the writeback offset
        #---------------------------------------------------
        print("Done instructing fte")
        i = 0
        while True:
            await RisingEdge(dut.clk)
            await Timer(10, units="ns")
            if dut.nsb_fte_resp_valid.value == 1:
                done = True
                break
            
            if i==1000000:
                done = False
                break
            i+=1
        reset_fte(dut)

It appears that introducing the while loop has caused the read operation to hang, as depicted in the provided image.
Comparing it with the completed read and write operations, which occurred without the while loop, everything seems to function correctly, as shown in the second image.

For reference, this is how I connected AxiRam to my hardware:

**cocotb:** 
  self.axi_ram = AxiRam(AxiBus.from_prefix(dut, "axi"), dut.clk, dut.rst, size=2**34)
**system verilog:** 
axi_interface axi_ram (
    .clk                        (clk),
    .rst                        (rst),

    .axi_awid                   (c0_ddr4_s_axi_awid),
    .axi_awaddr                 (c0_ddr4_s_axi_awaddr),
    .axi_awlen                  (c0_ddr4_s_axi_awlen),
    .axi_awsize                 (c0_ddr4_s_axi_awsize),
    .axi_awburst                (c0_ddr4_s_axi_awburst),
    .axi_awlock                 (c0_ddr4_s_axi_awlock),
    .axi_awcache                (c0_ddr4_s_axi_awcache),
    .axi_awprot                 (c0_ddr4_s_axi_awprot),
    .axi_awqos                  (c0_ddr4_s_axi_awqos), // not used 
    .axi_awregion               (), // not used
    .axi_awvalid                (c0_ddr4_s_axi_awvalid),
    .axi_awready                (c0_ddr4_s_axi_awready),
    .axi_wdata                  (c0_ddr4_s_axi_wdata),
    .axi_wstrb                  (c0_ddr4_s_axi_wstrb),
    .axi_wlast                  (c0_ddr4_s_axi_wlast),
    .axi_wvalid                 (c0_ddr4_s_axi_wvalid),
    .axi_wready                 (c0_ddr4_s_axi_wready),
    .axi_bid                    (c0_ddr4_s_axi_bid),
    .axi_bresp                  (c0_ddr4_s_axi_bresp),
    .axi_bvalid                 (c0_ddr4_s_axi_bvalid),
    .axi_bready                 (c0_ddr4_s_axi_bready),
    .axi_arid                   (c0_ddr4_s_axi_arid),
    .axi_araddr                 (c0_ddr4_s_axi_araddr),
    .axi_arlen                  (c0_ddr4_s_axi_arlen),
    .axi_arsize                 (c0_ddr4_s_axi_arsize),
    .axi_arburst                (c0_ddr4_s_axi_arburst),
    .axi_arlock                 (c0_ddr4_s_axi_arlock),
    .axi_arcache                (c0_ddr4_s_axi_arcache),
    .axi_arprot                 (c0_ddr4_s_axi_arprot),
    .axi_arqos                  (c0_ddr4_s_axi_arqos), // not used prefetcher_weight_bank_rm_axi_interconnect_axi_arqos
    .axi_arregion               (), // not used
    .axi_arvalid                (c0_ddr4_s_axi_arvalid),
    .axi_arready                (c0_ddr4_s_axi_arready),
    .axi_rid                    (c0_ddr4_s_axi_rid),
    .axi_rdata                  (c0_ddr4_s_axi_rdata),
    .axi_rresp                  (c0_ddr4_s_axi_rresp),
    .axi_rlast                  (c0_ddr4_s_axi_rlast),
    .axi_rvalid                 (c0_ddr4_s_axi_rvalid),
    .axi_rready                 (c0_ddr4_s_axi_rready)
);

May I ask if there is anyway to work around this? Thank you very very much for your help.