Counting Words at SIMD Speed

My blog post: Counting Words at SIMD Speed

Whitespace-delimited word count implemented five ways.

Results from my Apple M1 Pro:

Python (byte loop): 89.6 s
Python + re: 13.7 s
C (scalar loop): 1.205 s
C + NEON: 249 ms
C + NEON + threads: 181 ms (5.52 GiB/s)

Includes a 1 GiB corpus generator and Hyperfine benchmarks.

./bench.sh
Benchmark 1: python3 0_mvp.py bench.txt
  Time (mean ± σ):     89.556 s ±  7.214 s    [User: 87.205 s, System: 0.484 s]
  Range (min … max):   82.393 s … 96.820 s    3 runs

Benchmark 2: python3 1_c_regex.py bench.txt
  Time (mean ± σ):     13.739 s ±  0.136 s    [User: 13.433 s, System: 0.158 s]
  Range (min … max):   13.659 s … 13.896 s    3 runs

Benchmark 3: ./bin/2_mvp bench.txt
  Time (mean ± σ):      1.205 s ±  0.008 s    [User: 1.015 s, System: 0.115 s]
  Range (min … max):    1.198 s …  1.214 s    3 runs

Benchmark 4: ./bin/3_simd bench.txt
  Time (mean ± σ):     249.2 ms ±   6.3 ms    [User: 79.5 ms, System: 100.5 ms]
  Range (min … max):   243.8 ms … 262.6 ms    11 runs

Benchmark 5: ./bin/4_threads bench.txt
  Time (mean ± σ):     181.1 ms ±   4.1 ms    [User: 96.5 ms, System: 93.5 ms]
  Range (min … max):   177.4 ms … 193.7 ms    16 runs

Summary
  ./bin/4_threads bench.txt ran
    1.38 ± 0.05 times faster than ./bin/3_simd bench.txt
    6.65 ± 0.16 times faster than ./bin/2_mvp bench.txt
   75.84 ± 1.86 times faster than python3 1_c_regex.py bench.txt
  494.38 ± 41.35 times faster than python3 0_mvp.py bench.txt

Files

0_mvp.py, 1_c_regex.py: Python baselines
2_mvp.c: scalar C
3_simd.c: ARM NEON SIMD
4_threads.c: multithreaded NEON
setup_benchmark.c: generates bench.txt (1 GiB)
bench.sh: builds everything and runs benchmarks

Requirements

macOS on Apple Silicon (ARM64)
clang, python3, hyperfine

Quick start

bash bench.sh

This produces bench.txt, builds binaries into bin/, and benchmarks all implementations.

Note: the Python ones are very slow – you might want to comment those out!

kevmo314/counting-words-at-simd-speed

Counting Words at SIMD Speed

Files

Requirements

Quick start