"Performance aware programming" course

My solutions and sandbox for course: https://www.computerenhance.com/

Solutions mostly written in zig or C/C++ (using zig build system). zig version 0.11.0 unless specified otherwise.

Haversine distance

haversine.zig

How to run

If you don't have data set, first generate it:

zig build -Doptimize=ReleaseFast run -- -f 10m.json -g 10000000

Run processing & benchmarking:

zig build -Doptimize=ReleaseFast run -- -f 10m.json -i 5 -r

Notes

  • Generator (40x speedup) from buffering file io. C/C++ standard library does it by default (if not disabled) however zig gives you full control over it therefore requires few extra lines of code.

Results

All tests/benchmarking done using ~5 years old machine:

CPU: (Skylake) Intel i7-6700 CPU @ 3.40GHz
        turbo boost up to 4Ghz
        4 cores / 8 threads
        Cache L1: 	64K (per core)
        Cache L2: 	256K (per core)
        Cache L3: 	8MB (shared)
RAM: 32GB dual-channel (Crucial CT16G4DFRA266  2 x 16 GB DDR4 2666 MHz)

Generate

Generates 10M random coordinate pairs and prints them out in json format. 8.84s

Basic implementation

  • read whole file in memory
  • parse (using default std.json.parser) into a struct
  • loop over and do the math Only optimization more or less is just to inline haversine function. Let the compiler optimize.

1.4 million haversines/second zig version 0.11.0-dev.1593+d24ebf1d1 before this change

BEST SUB-RESULTS:
(test-iterations: 20, input size: 10000000, float: f64)
        avg harvestine:          10008.47710
        read time:               828.909ms
        parse time:              5.415s
        math time:               654.993ms
        total time:              7.046s
        throughput:              1419120 haversines/second

zig version 0.11.0

BEST SUB-RESULTS:
(test-iterations: 20, input size: 10000000, float: f64)
        avg haversine:          10006.62310
        read time:               1.281s
        parse time:              4.794s
        math time:               650.867ms
        total time:              6.834s
        throughput:              1463088 haversines/second