DRAM interface for WRITE/READ?

Question

DRAM interface for WRITE/READ?

Opened this issue 4 years ago · 5 comments

Hello,

I am looking at the provided DRAM interfaces for either DRAMsim as well as the Ramulator and I am not clear. In detail, dram_dramsim.cc line 142 we see:
if (m_dramsim->addTransaction(req->m_type == MRT_WB,static_cast<uint64_t>(req->m_addr))) ..
and also in dram_ramulator.cc line 145 we have:
if (req->m_type != MRT_WB) { ...

Here it seems every request generated by the cores are considered either MRT_WB as a WRITE request or not (everything else considered as READ request to DRAM).

Investigating with two different traces from IsolBench (Bandwidth Read and Bandwidth Write), I can confirm that most of the requests of Bandwidth Read application is of type of MRT_DFETCH and most of the requests of Bandwidth Write application is of type of MRT_DSTORE (not MRT_WB). Therefore, simulation with Ramulator/DRAMsim always receives READ requests even the core generates MRT_DSTORE.

Can you perhaps elaborate on this and let me know why all requests are considered either MRT_WB or NOT? I believe MRT_DSTORE should not be considered as READ request!! (or I am wrong?)

Fixing the statement to consider: m_type == MRT_WB || m_type == MRT_DSTORE as WRITE request gives ASSERT FAILED for both Ramulator/DRAMsim.

Any feedback would be appreciated.

Answer 1 · 2020-10-03T18:34:11.000Z

Even a write request from a core, the write request needs to bring a block from the memory first, so it becomes a DRAM READ. A dirty block eviction from cache becomes write operations.

Answer 2 · 2020-10-03T19:27:49.000Z

Thanks for the explanation and yes, that is correct; however, I do not see any write back during the execution (I think at some point I should see some) but it could be due to the cache configuration or working set size of the benchmark.

Answer 3 · 2020-10-06T20:28:32.000Z

As an update, I am not clear regarding how write back are handled. I don't see any evicted dirty block not from the provided mergesort trace nor from my own traces. For instance, a streaming bandwidth benchmark which only tries to write to memory, will only cause MRT_DSTORE request (which is a READ to DRAM) and no write back is initiated in the simulation to the DRAM (there are small number of WB from L1 to L2 and L2 to LLC).

I have tried with all cache configurations and this issue still applies. The other thing is that it seems bypassing the caches does not have any affect on the performance (IPC, number of DRAM request)! Does it mean that the caches are not being used at all?
Sorry but I am really confused.

Answer 4 · 2020-10-07T12:03:19.000Z

We have debugged the memory system several times in the past and confirmed the write back activities. (but that doesn't mean that it hasn't changed since.) I'll try to take a look at it in a few days. can you share more detailed info of your config and trace file to replicate your exp. and how did you by-pass caches?

Answer 5 · 2020-10-07T13:34:15.000Z

Thank you for getting back on this. Here is the detailed info of my trace and configs:

I am using bandwidth.c from with either access type of READ or WRITE and the cache line size is 64. The trace is generated with the latest pin that was updated in the MacSim repo (3.7). In addition, I am using SIM_BEGIN(1) and SIM_END(1) in order to capture only the "actual accesses" in the source code. I attached the traces.
Notice that, for the mergesort.raw trace that is available in the repo, the same thing still happens such a way that there are WB from L1 and L2 but never from LLC.
I have used many configurations in terms of the cache architecture. Basically, in the previous versions of MacSim, I could use no_cache as a memory type, but now it does not go through and gives seg fault. Using llc_decoupled_network and l2_decoupled_network leads to the case where I do see a small number of WB from L1 and L2 but not LLC. I am using one x86 core with OoO scheduling. I attach the params.in and trace_file_list.
files.zip
Regarding the by-passing, I have set l2_large_bypass (l1_large_bypass) knobs in the params.in for the mergesort trace. According to the MacSim document, every access to these levels of cache should be miss; however, I see cache hit even having these knobs enabled.

I really appreciate your time on this.