RRZE-HPC/pycachesim

Feature Request: Get loaded cache lines

stettberger opened this issue · 2 comments

As a user, I would like to know which memory addresses are loaded from the MainMemory. I came up with a "solution" that parses cachesims log output. But really that is not a real solution. Here is my "solution" for reference:

import sys
import io
from cachesim import CacheSimulator, Cache, MainMemory

mem = MainMemory()
l1 = Cache("L1", 128, 4, 32, "LRU")
mem.load_to(l1)
mem.store_from(l1)
cs = CacheSimulator(l1, mem)
l1.backend.verbosity=4

def cache_op(cache, op, addr, width):
    """This is wild"""
    old = cache.stats()['MISS_count']
    old_sys = sys.stdout
    sys.stdout = io.StringIO()
    op(addr, width)
    X = sys.stdout
    sys.stdout = old_sys
    delta =  cache.stats()['MISS_count'] - old
    lines = [x for x in X.getvalue().split("\n") if 'MISS' in x and 'LOAD' in x]
    cls = []
    for line in lines:
        line = line[line.index("cl_id=")+len("cl_id="):]
        line = line[:line.index(" ")]
        line = int(line)
        if line not in cls:
            cls.append(line)
    assert len(cls) == delta, X.getvalue()
    return cls

print('Loaded CL_ids:', cache_op(l1, cs.load, 16, 4))
print('Loaded CL_ids:', cache_op(l1, cs.load, 20, 4))
print('Loaded CL_ids:', cache_op(l1, cs.load, 32, 4))
print('Loaded CL_ids:', cache_op(l1, cs.load, 127, 4))

Result:

Loaded CL_ids: [0]
Loaded CL_ids: []
Loaded CL_ids: [1]
Loaded CL_ids: [3, 4]

Hi @stettberger,

sorry for the delayed reply!
I am not sure if my suggested solution fully meets your requirements, but maybe it is a first step in the right direction.

The cleanest way would be check against cache.backend.cached for a specific cache if a cache line ID is in the cache, but due to the fact this is a set and quite expensive in terms of computation, I would recommend introducing another dictionary data structure like this:

from cachesim import CacheSimulator, Cache, MainMemory
import collections
import itertools


def load_and_track_cl(CacheSimulator: cs, addr, length):
    cl_ids = set()
    # make sure to have one address per CL in case length > cl_size
    addresses = itertools.chain(range(addr, addr+length, cs.last_level.backend.cl_size), [addr+length-1])
    for ad in addresses:
        cl_id = ad >> cs.last_level.backend.cl_bits
        if cl_id not in cs.cache_dict:
            cl_ids.add(cl_id)
            cs.cache_dict[cl_id] = True
    # do actual load
    cs.load(addr, length)
    return cl_ids


mem = MainMemory()
l1 = Cache("L1", 128, 4, 32, "LRU")
mem.load_to(l1)
mem.store_from(l1)
cs = CacheSimulator(l1, mem)
# create additional data structure in CacheSimulator
cs.cache_dict = collections.OrderedDict()

print("Loaded CL_ids: {}".format(load_and_track_cl(cs, 16, 4)))
print("Loaded CL_ids: {}".format(load_and_track_cl(cs, 20, 4)))
print("Loaded CL_ids: {}".format(load_and_track_cl(cs, 32, 4)))
print("Loaded CL_ids: {}".format(load_and_track_cl(cs, 127, 4)))
print("Loaded CL_ids: {}".format(load_and_track_cl(cs, 192, 80)))

Result:

Loaded CL_ids: {0}
Loaded CL_ids: set()
Loaded CL_ids: {1}
Loaded CL_ids: {3, 4}
Loaded CL_ids: {8, 6, 7}

I checked this against your code snippet and validated it with a few hundred thousands randomly generated memory addresses and timed it, which gave me an approx. speedup of 3x.
Please keep in mind that there are cases in which this doesn't work , for example, if you have a victim cache.
I hope this helps nonetheless!

Closing this issue for now, but I am happy to further discuss the topic!
Therefore, feel free to reopen.