This repository contains full examples and other associated code from https://en.algorithmica.org/hpc
The book is still unfinished, and my writing process is very slow and non-sequential — sometimes the "idea → code → benchmarks → article" pipeline may take 6 months or even more — so in this repository you can get a preview on a lot of interesting things that I haven't yet properly written up and published.
Things that have improved on the state-of-the-art:
- Many variants of binary search, the fastest one achieving ~15x speedup over
std::lower_bound
for small arrays (that fit in cache) and ~8x speedup for large arrays (>1e6). - Argmin at the speed of memory.
- Implementation of the Floyd-Warshall algorithm that is about 50x faster than the naive "for-for-for" algorithm.
Things that match current state-of-the-art:
- A version of a segment tree that can compute prefix sums in ~2ns plus the time of the slowest memory read.
- (✓ published) An implementation of GCD that works 2-3 faster than
std::gcd
. - Integer factorization taking ~0.5ms per 60-bit integer.
- An algorithm for parsing series of integers ~2x faster than
scanf("%d")
does. - An implementation of BLAS-level matrix multiplication that can be expressed in about 30 lines of C.
- Various efficient hash tables.
- Efficient FFT and Karatsuba algorithm implementations.
Various benchmarks:
- Benchmarks for branching and predication.
- Benchmarks for RAM and CPU cache system.
At the implementation stage:
- Ordered Trees (apply the same technique as with binary searching, but with dynamically-allocated B-tree nodes)
- Range minimum queries (both static and dynamic)
- Filters (Bloom, cuckoo, xor, theoretical minimum)
- Dot product / logistic regression (newton's method, SIMD, quantization)
- Prime number sieves (blocking plus wheel)
- Sorting (speeding up quicksort and mergesort with SIMD and radix sort)
- Writing series of integers (SIMD + fast mod-10)
- Bitmaps (blocking, SIMD)
At the idea stage:
- String searching (SIMD-based strstr and rolling hashing)
- Using SIMD to speed up Pollard's algorithm (naive sqrt-parallelization)
- SIMD-based random number generation and hashing