add find large

Question

add find large

ktprime opened this issue 2 years ago · 2 comments

RandomFind_2000 is small dataset (load factor = 0.49)
and RandomFind_5000000 is medium dataset(load factor = 0.48)
I think change it to 600000 with load factor = 0.57 is better.

add a large dataset 12'000'0000(load factor = 0.71)


BENCHMARK(RandomFind_12000000) {

    static constexpr auto lower32bit = UINT64_C(0x00000000FFFFFFFF);
    static constexpr auto upper32bit = UINT64_C(0xFFFFFFFF00000000);
    static constexpr auto medium32bit = UINT64_C(0x0000FFFFFFFF0000);

    static constexpr size_t numInserts = 12'000'000;
    static constexpr size_t numFindsPerInsert = 10;

    bench.endMeasure(169388,    randomFindInternal(bench, 4, upper32bit, numInserts, numFindsPerInsert));
    bench.endMeasure(30126934,  randomFindInternal(bench, 3, upper32bit, numInserts, numFindsPerInsert));
    bench.endMeasure(60082938,  randomFindInternal(bench, 2, medium32bit, numInserts, numFindsPerInsert));
    bench.endMeasure(90041230,  randomFindInternal(bench, 1, lower32bit, numInserts, numFindsPerInsert));
    bench.endMeasure(119999853, randomFindInternal(bench, 0, lower32bit, numInserts, numFindsPerInsert));
}

Answer 1 · 2022-10-28T17:49:49.000Z

Actually the benchmark adds items too to test with different load factors. On average the load factor for your table8 is 0.57397 (min 0.4, max 0.8)

Answer 2 · 2022-10-29T03:03:47.000Z

for small dataset all key/value can be filled in cpu cache,
as for medium dataset, meta data of some hash map (why absl::flat_hash_map fastest) is also cpu cached, but for large dataset it's quite different.