/bsearch

Benchmarking various implementations of binary search

Primary LanguageC++MIT LicenseMIT

Description

This project benchmarks several implementations of binary search over a dense array of PODs:

  • bsearch_standard - vanilla binary search
  • bsearch_binned - binary search with binning info prepended to the array -- saves first few iterations of the binary search
  • bsearch_breadth - breadth-first (tree-like) layout of the array
  • bsearch_veb - Van Emde Boas, recursive version
  • bsearch_veb_iter - Van Emde Boas, iterative version

How to Build

The build script recognizes these options:

$ ./build_test_bsearch.sh 
usage: ./build_test_bsearch.sh { gcc | clang | debug }
  • gcc - use the system-default g++ compiler to build a release binary
  • clang - use the system-default clang++ compiler to build a release binary
  • debug - use the system-default clang++ compiler to build a debug binary

How to benchmark

The benchmark tool recognizes these options:

$ ./test_bsearch --help
usage: ./test_bsearch [space_size <unsigned>] [alt <unsigned>] [<sample_size>]
        alt 0: standard binary search (default)
        alt 1: binned binary search
        alt 2: breadth-first layout binary search
        alt 3: Van Emde Boas (VEB) layout binary search, recursive version
        alt 4: VEB layout binary search, iterative version
        alt 5: standard linear search
        alt 6: binned linear search

The POD of the search space is hardcoded to float -- to build for another POD change searchitem_t in test_bsearch.cpp. The default search-set size (sample_size) is set to 10M; the default search-space size (space_size) is 2K. To benchmark the performance of standard binary search over a search-space size of 2^24 and a search-set size of 10M do:

$ ./test_bsearch space_size $(echo "2^24" | bc) alt 0

Warning: don't run any of the linear searches (alt 5 & 6) on large seach spaces unless you have unlimited machine time and patience.

Results

Averaged searches/second from 10M random binary searches in a 64MB dataset -- see graph below

CPU (single thread only) bsearch_standard bsearch_binned bsearch_breadth bsearch_veb bsearch_veb_iter
Intel E3-1270v2 3.9GHz, DDR3-1600 x 128-bit = 25.6GB/s, g++-4.9.4 2,368,386 2,367,576 4,676,543 7,733,798 7,916,109
Intel E3-1270v2 3.9GHz, DDR3-1600 x 128-bit = 25.6GB/s, clang++-3.6.2 2,340,876 2,355,943 4,787,252 7,749,015 7,395,371
Intel E3-1270v2 1.6GHz, DDR3-1600 x 128-bit = 25.6GB/s, g++-4.9.4 1,233,215 1,314,080 2,067,070 3,686,770 3,754,413
Intel E3-1270v2 1.6GHz, DDR3-1600 x 128-bit = 25.6GB/s, clang++-3.6.2 1,182,673 1,307,087 2,066,907 3,682,098 3,431,577
Intel i7-5820k 3.6GHz, DDR4-2133 x 256-bit = 68.26GB/s, g++-6.3.1 2,377,322 2,413,404 3,128,115 6,488,559 5,683,215
Intel i7-5820k 3.6GHz, DDR4-2133 x 256-bit = 68.26GB/s, clang++-7.0.1 2,350,576 2,407,037 3,565,783 6,747,819 6,723,463
Intel W-2155 4.5GHz, DDR4-2666 x 256-bit = 85.3GB/s, g++-10.2 1,988,754 2,163,868 4,237,943 6,622,578 6,646,915
Intel W-2155 4.5GHz, DDR4-2666 x 256-bit = 85.3GB/s, clang++-10.0 2,004,589 2,206,814 3,950,995 8,030,376 8,045,392
Intel i7-9850H 4.6GHz, DDR4-2666 x 128-bit = 42.65GB/s, g++-9.3.0 2,865,838 3,008,460 5,091,687 7,120,814 7,047,737
Intel i7-9850H 4.6GHz, DDR4-2666 x 128-bit = 42.65GB/s, clang++-10.0.0 3,018,475 3,120,316 4,714,052 8,569,367 8,548,059
AMD Ryzen 1700 3.0GHz, DDR4-2400 x 128-bit = 38.4GB/s, g++-9.3.0 855,529 1,348,926 2,727,316 4,492,173 4,478,647
Mediatek MT8173C 2.1GHz, LPDDR3-1600 x 64-bit = 12.8GB/s, g++-8.1.0, A64 490,095 526,391 744,652 2,035,683 2,031,427
Mediatek MT8173C 2.1GHz, LPDDR3-1600 x 64-bit = 12.8GB/s, clang++-5, A64 505,499 526,628 724,331 2,051,007 2,052,157
Rockchip RK3399 2.0GHz, LPDDR3-1600 x 64-bit = 12.8GB/s, g++-8.1.0, A64 462,760 478,643 738,867 2,038,539 2,046,880
Rockchip RK3399 2.0GHz, LPDDR3-1600 x 64-bit = 12.8GB/s, clang++-5, A64 450,501 552,473 729,131 2,069,239 2,064,636
Amlogic S922X 1.8GHz, LPDDR4-2640 x 32-bit = 10.56GB/s, clang++-6, A64 304,954 396,626 753,789 2,035,342 1,976,181
Marvell A8040 1.3GHz, DDR4-1600 x 64-bit = 12.8GB/s, g++-8.1.0, A64 295,051 371,600 681,139 1,787,888 1,795,045
Marvell A8040 1.3GHz, DDR4-1600 x 64-bit = 12.8GB/s, clang++-5, A64 299,796 385,103 662,800 1,775,231 1,776,729
Marvell A8040 1.6GHz, DDR4-2100 x 64-bit = 16.8GB/s, g++-8.1.0, A64 357,403 457,187 835,043 2,164,438 2,174,108
Marvell A8040 1.6GHz, DDR4-2100 x 64-bit = 16.8GB/s, clang++-5, A64 362,563 468,497 815,416 2,151,163 2,190,350
Marvell A8040 2.0GHz, DDR4-2400 x 64-bit = 19.2GB/s, g++-8.1.0, A64 398,350 511,817 938,054 2,469,142 2,487,483
Marvell A8040 2.0GHz, DDR4-2400 x 64-bit = 19.2GB/s, clang++-5, A64 402,651 524,488 909,785 2,553,528 2,543,347
AWS Graviton 2.29GHz, DDR4-2400 x 64-bit = 19.2GB/s, g++-8.3.0, A64 687,076 760,548 1,205,667 3,089,063 3,087,659
AWS Graviton 2.29GHz, DDR4-2400 x 64-bit = 19.2GB/s, clang++-6, A64 613,374 729,347 1,195,206 3,120,435 3,119,255
NXP LX2160A 2.0GHz, DDR4-2400 x 64-bit = 19.2GB/s, g++-8.3.0, A64 464,008 510,753 1,437,537 3,143,859 3,144,161
NXP LX2160A 2.0GHz, DDR4-2400 x 64-bit = 19.2GB/s, clang++-8.0.0, A64 468,871 532,058 1,389,208 3,058,224 3,049,864
Tegra 210 1.428GHz, LPDDR4-3200 x 64-bit = 25.6GB/s, g++-8.2.0, A64 327,418 433,628 818,782 2,072,252 2,050,263
Tegra 210 1.428GHz, LPDDR4-3200 x 64-bit = 25.6GB/s, clang++-6, A64 323,666 434,564 824,378 1,924,824 1,953,150
Snapdragon SQ1 3.0GHz, LPDDR4-4266 x 128-bit = 68.26GB/s, g++-10.0.1, A64 957,554 977,243 2,045,156 4,421,279 4,433,569
Snapdragon SQ1 3.0GHz, LPDDR4-4266 x 128-bit = 68.26GB/s, clang++-9, A64 941,382 1,000,550 2,035,532 4,442,965 4,411,135
Apple A12z 2.49GHz, LPDDR4-4266 x 128-bit = 68.26GB/s, apple-clang++-12, A64 1,838,669 1,916,958 4,544,838 7,013,375 7,028,870
Apple M1 3.2GHz, LPDDR4-4266 x 128-bit = 68.26GB/s, apple-clang++-12, A64 2,958,828 2,988,595 8,042,966 10,160,952 10,153,486

10M average searches/second

10M average searches/second