consider profile-guided optimizations
cldellow opened this issue · 4 comments
There are lots of dumb performance sinkholes. eg inlining currentRowSatisfiesFilter
speeds up the code by 10% in some cases. There's probably no reason not to inline, as it was broken out only for readability and has only one caller. Still, finding these by hand will suck.
Go and read https://stackoverflow.com/questions/4365980/how-to-use-profile-guided-optimizations-in-g and see if we can apply it.
A profile generated from running tests/test-all
results in:
- 10% on the census cyclist query, 42ms -> 38ms
- 11% on
select count(*) from census where profile_id = '1930'
444ms -> 393ms - 4% on
select count(*) from census where profile_id = 1930
1920ms -> 1860ms
We could probably improve this by putting more realistic queries in the test dataset. Maybe Mark Litwintschik's blog post w/benchmarks could be used for data?
Anyway, worth pursuing as part of an automated release system.
This is only PGO on the vtable implementation?
I would be very interested if it makes a difference on parquet-cpp
too.
Also I usually profile a bit with perf and generate http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html This might also be helpful.
Yes, this is just on the vtable code. I'll have a look at trying it on the rest of the toolchain, too.
Thanks for the reference to flame graphs.
Enabling it on the parquet only didn't change anything, enabling it on parquet and arrow actually regressed it significantly. :( I can see the .gcda
files being created and I'm pretty sure the make output shows it's picking up the correct libs. The enthusiast in me wants to dig in further, but given my relative lack of C++ experience, I think I'll have to put that on the shelf for now. :) Timings for posterity are below.
On gcc-5.5, vtable, parquet-cpp and arrow (via the -DCMAKE_BUILD_TYPE=profile_gen
and DCMAKE_BUILD_TYPE=profile_build
cmake flags):
'1930
' -> 600ms1930
-> 2200ms
On gcc-7.3, vtable, parquet-cpp and arrow:
'1930
' -> 600ms1930
-> 2000ms
On gcc-7.3, just vtable:
'1930'
-> 376ms1930
-> 1810ms