New C++ version 3x faster on 10k key dataset

Question

New C++ version 3x faster on 10k key dataset

lehuyduc opened this issue 8 months ago · 6 comments

Hi, I've updated my code to optimize for the 10k keys dataset. On my PC it's ~3x faster (excluding munmap time) than the commit you tested. Default dataset performance is a bit slower.

Just ./run_cpp.sh to compile and run.

To test the effect of hyper threading, you can do ./run_cpp.sh 12 12 (12 == number of threads total on your CPU). You will see interesting effects on the 10K dataset :D

Thanks! Looking forwards to your updated result.

Answer 1 · 2024-01-17T19:49:10.000Z

buybackoff commented 8 months ago

Answer 2 · 2024-01-17T20:24:04.000Z

@lehuyduc I assume you checked the output vs correct one? I'm too lazy to redo that every time.

Answer 3 · 2024-01-17T20:32:14.000Z

Yes. I tested on 3 different measurements.txt files, and they're correct. If I find any new error, i'll fix it.

./run_cpp.sh 12 12 could you test the result of this one too? To see how hyper threading is bad for performance when there's many branch miss or L3 cache miss.

Answer 4 · 2024-01-17T20:37:22.000Z

./run_cpp.sh 12 12 vs ./run_cpp.sh 12 6 are not so much different, it would be the same second decimal even if the delta > sigma. Didn't look too deeply into that.

Answer 5 · 2024-01-17T20:41:01.000Z

Huh, so I guess this is an AMD specific problem. If I run with all virtual threads on 2950X, it's much slower, like 30+% slower. Anyway, thanks for testing!

Answer 6 · 2024-01-17T21:43:03.000Z

The blog update is now deploying