1BRC is a challenge, originally in Java to process a text file of weather station names and temperatures, and for each weather station, print out the minimum, mean, and maximum. It sounds simple, but the catch is that the file contains one billion rows. Fun!
I took to solve the challenge in two parts - first being to optimize the creation of one billion measurements, and then to actually process them as per the original challenge.
- The goal here is to create a file with one billion rows, each row containing a weather station name and a temperature.
- There are baseline temperatures for each station given, and the temperature for each row is the baseline temperature for that station plus a random number between -10 and 10.
- The file is a text file with each line of the format
station_name;temperature
. - The fastest solution I've implemented so far creates the file with 1 billion measurements in 205 seconds. The code is at
lib/measurements_generator.ex
. - For more details, check out the doc.
run create_measurements 1000
- creates a file./data/measurements.1000.txt
with 1000 measurements. You can change 1000 to the number of measurements you want.run create_measurements.profile 1000
- creates a file./data/measurements.1000.txt
with 1000 measurements and profiles the execution usingeprof
.- Check
./bin/run
to see what the above commands do.
- WIP.
- For more details, check out the doc.
run create_baseline_results 1000
- creates a file./data/result_baseline.10000.json
with the correct results for 1000 measurements. This file can then be used to verify the correctness of results we get while trying to optimize the processing.run process_measurements 1000
- processes the file./data/measurements.1000.txt
with 1000 measurements, generatesresult.10000.json
& verifies their correctness with the file./data/result_baseline.10000.json
.run process_measurements.profile 1000
- processes the file./data/measurements.1000.txt
with 1000 measurements and profiles the execution usingeprof
.- Check
./bin/run
to see what the above commands do.
- This repo uses Nix flakes to manage dependencies. To get started, run
direnv allow
in the root of the repo to activate the Nix environment. - Execute
run deps
to install dependencies. - To create measurements file, execute
run create_measurements 1_000_000
. This will create a filedata/measurements.1000000.txt
with 1 million measurements. - To process the measurements, execute
run process_measurements 1_000_000
. This will process the file created in the previous step. - That's it, you can follow the codepaths, starting from
bin/run
to explore further ✌️