Description
This project benchmarks several implementations of binary search over a dense array of PODs:
- bsearch_standard - vanilla binary search
- bsearch_binned - binary search with binning info prepended to the array -- saves first few iterations of the binary search
- bsearch_breadth - breadth-first (tree-like) layout of the array
- bsearch_veb - Van Emde Boas, recursive version
- bsearch_veb_iter - Van Emde Boas, iterative version
How to Build
The build script recognizes these options:
$ ./build_test_bsearch.sh
usage: ./build_test_bsearch.sh { gcc | clang | debug }
- gcc - use the system-default g++ compiler to build a release binary
- clang - use the system-default clang++ compiler to build a release binary
- debug - use the system-default clang++ compiler to build a debug binary
How to benchmark
The benchmark tool recognizes these options:
$ ./test_bsearch --help
usage: ./test_bsearch [space_size <unsigned>] [alt <unsigned>] [<sample_size>]
alt 0: standard binary search (default)
alt 1: binned binary search
alt 2: breadth-first layout binary search
alt 3: Van Emde Boas (VEB) layout binary search, recursive version
alt 4: VEB layout binary search, iterative version
alt 5: standard linear search
alt 6: binned linear search
The POD of the search space is hardcoded to float
-- to build for another POD change searchitem_t
in test_bsearch.cpp. The default search-set size (sample_size
) is set to 10M; the default search-space size (space_size
) is 2K. To benchmark the performance of standard binary search over a search-space size of 2^24 and a search-set size of 10M do:
$ ./test_bsearch space_size $(echo "2^24" | bc) alt 0
Warning: don't run any of the linear searches (alt
5 & 6) on large seach spaces unless you have unlimited machine time and patience.
Results
Averaged searches/second from 10M random binary searches in a 64MB dataset -- see graph below
CPU (single thread only) | bsearch_standard | bsearch_binned | bsearch_breadth | bsearch_veb | bsearch_veb_iter |
---|---|---|---|---|---|
Intel E3-1270v2 3.9GHz, DDR3-1600 x 128-bit = 25.6GB/s, g++-4.9.4 | 2,368,386 | 2,367,576 | 4,676,543 | 7,733,798 | 7,916,109 |
Intel E3-1270v2 3.9GHz, DDR3-1600 x 128-bit = 25.6GB/s, clang++-3.6.2 | 2,340,876 | 2,355,943 | 4,787,252 | 7,749,015 | 7,395,371 |
Intel E3-1270v2 1.6GHz, DDR3-1600 x 128-bit = 25.6GB/s, g++-4.9.4 | 1,233,215 | 1,314,080 | 2,067,070 | 3,686,770 | 3,754,413 |
Intel E3-1270v2 1.6GHz, DDR3-1600 x 128-bit = 25.6GB/s, clang++-3.6.2 | 1,182,673 | 1,307,087 | 2,066,907 | 3,682,098 | 3,431,577 |
Intel i7-5820k 3.6GHz, DDR4-2133 x 256-bit = 68.26GB/s, g++-6.3.1 | 2,377,322 | 2,413,404 | 3,128,115 | 6,488,559 | 5,683,215 |
Intel i7-5820k 3.6GHz, DDR4-2133 x 256-bit = 68.26GB/s, clang++-7.0.1 | 2,350,576 | 2,407,037 | 3,565,783 | 6,747,819 | 6,723,463 |
Intel W-2155 4.5GHz, DDR4-2666 x 256-bit = 85.3GB/s, g++-10.2 | 1,988,754 | 2,163,868 | 4,237,943 | 6,622,578 | 6,646,915 |
Intel W-2155 4.5GHz, DDR4-2666 x 256-bit = 85.3GB/s, clang++-10.0 | 2,004,589 | 2,206,814 | 3,950,995 | 8,030,376 | 8,045,392 |
Intel i7-9850H 4.6GHz, DDR4-2666 x 128-bit = 42.65GB/s, g++-9.3.0 | 2,865,838 | 3,008,460 | 5,091,687 | 7,120,814 | 7,047,737 |
Intel i7-9850H 4.6GHz, DDR4-2666 x 128-bit = 42.65GB/s, clang++-10.0.0 | 3,018,475 | 3,120,316 | 4,714,052 | 8,569,367 | 8,548,059 |
AMD Ryzen 1700 3.0GHz, DDR4-2400 x 128-bit = 38.4GB/s, g++-9.3.0 | 855,529 | 1,348,926 | 2,727,316 | 4,492,173 | 4,478,647 |
Mediatek MT8173C 2.1GHz, LPDDR3-1600 x 64-bit = 12.8GB/s, g++-8.1.0, A64 | 490,095 | 526,391 | 744,652 | 2,035,683 | 2,031,427 |
Mediatek MT8173C 2.1GHz, LPDDR3-1600 x 64-bit = 12.8GB/s, clang++-5, A64 | 505,499 | 526,628 | 724,331 | 2,051,007 | 2,052,157 |
Rockchip RK3399 2.0GHz, LPDDR3-1600 x 64-bit = 12.8GB/s, g++-8.1.0, A64 | 462,760 | 478,643 | 738,867 | 2,038,539 | 2,046,880 |
Rockchip RK3399 2.0GHz, LPDDR3-1600 x 64-bit = 12.8GB/s, clang++-5, A64 | 450,501 | 552,473 | 729,131 | 2,069,239 | 2,064,636 |
Amlogic S922X 1.8GHz, LPDDR4-2640 x 32-bit = 10.56GB/s, clang++-6, A64 | 304,954 | 396,626 | 753,789 | 2,035,342 | 1,976,181 |
Marvell A8040 1.3GHz, DDR4-1600 x 64-bit = 12.8GB/s, g++-8.1.0, A64 | 295,051 | 371,600 | 681,139 | 1,787,888 | 1,795,045 |
Marvell A8040 1.3GHz, DDR4-1600 x 64-bit = 12.8GB/s, clang++-5, A64 | 299,796 | 385,103 | 662,800 | 1,775,231 | 1,776,729 |
Marvell A8040 1.6GHz, DDR4-2100 x 64-bit = 16.8GB/s, g++-8.1.0, A64 | 357,403 | 457,187 | 835,043 | 2,164,438 | 2,174,108 |
Marvell A8040 1.6GHz, DDR4-2100 x 64-bit = 16.8GB/s, clang++-5, A64 | 362,563 | 468,497 | 815,416 | 2,151,163 | 2,190,350 |
Marvell A8040 2.0GHz, DDR4-2400 x 64-bit = 19.2GB/s, g++-8.1.0, A64 | 398,350 | 511,817 | 938,054 | 2,469,142 | 2,487,483 |
Marvell A8040 2.0GHz, DDR4-2400 x 64-bit = 19.2GB/s, clang++-5, A64 | 402,651 | 524,488 | 909,785 | 2,553,528 | 2,543,347 |
AWS Graviton 2.29GHz, DDR4-2400 x 64-bit = 19.2GB/s, g++-8.3.0, A64 | 687,076 | 760,548 | 1,205,667 | 3,089,063 | 3,087,659 |
AWS Graviton 2.29GHz, DDR4-2400 x 64-bit = 19.2GB/s, clang++-6, A64 | 613,374 | 729,347 | 1,195,206 | 3,120,435 | 3,119,255 |
NXP LX2160A 2.0GHz, DDR4-2400 x 64-bit = 19.2GB/s, g++-8.3.0, A64 | 464,008 | 510,753 | 1,437,537 | 3,143,859 | 3,144,161 |
NXP LX2160A 2.0GHz, DDR4-2400 x 64-bit = 19.2GB/s, clang++-8.0.0, A64 | 468,871 | 532,058 | 1,389,208 | 3,058,224 | 3,049,864 |
Tegra 210 1.428GHz, LPDDR4-3200 x 64-bit = 25.6GB/s, g++-8.2.0, A64 | 327,418 | 433,628 | 818,782 | 2,072,252 | 2,050,263 |
Tegra 210 1.428GHz, LPDDR4-3200 x 64-bit = 25.6GB/s, clang++-6, A64 | 323,666 | 434,564 | 824,378 | 1,924,824 | 1,953,150 |
Snapdragon SQ1 3.0GHz, LPDDR4-4266 x 128-bit = 68.26GB/s, g++-10.0.1, A64 | 957,554 | 977,243 | 2,045,156 | 4,421,279 | 4,433,569 |
Snapdragon SQ1 3.0GHz, LPDDR4-4266 x 128-bit = 68.26GB/s, clang++-9, A64 | 941,382 | 1,000,550 | 2,035,532 | 4,442,965 | 4,411,135 |
Apple A12z 2.49GHz, LPDDR4-4266 x 128-bit = 68.26GB/s, apple-clang++-12, A64 | 1,838,669 | 1,916,958 | 4,544,838 | 7,013,375 | 7,028,870 |
Apple M1 3.2GHz, LPDDR4-4266 x 128-bit = 68.26GB/s, apple-clang++-12, A64 | 2,958,828 | 2,988,595 | 8,042,966 | 10,160,952 | 10,153,486 |