1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated. The challenge was primarily foces on Java but I decided to solve it in Golang!
I wrote a detailed blog about my implementation approach, you can check it out here.
Final implementation approach looks like this:
Here is a more detailed record of each individual iteration:
Attempt Number | Approach | Execution Time | Diff | Commit |
---|---|---|---|---|
0 | Naive Implementation: Read temperatures into a map of cities. Iterate serially over each key (city) in map to find min, max and average temperatures. | 6:13.15 | ||
1 | Evaluate each city in map concurrently using goroutines. | 4:32.80 | -100.35 | 8bd5f43 |
2 | Remove sorting float64 slices. Calculate min, max and average by iterating. | 4:25.59 | -7.21 | 830e5df |
3 | Decouple reading and processing of file content. A buffered goroutine is used to communicate between the two processes. | 5:22.83 | +57.24 | 2babf7d |
4 | Instead of sending each line to the channel, now sending 100 lines chunked together. Also, to minimise garbage collection, not freeing up memory when resetting a slice. | 3:41.76 | -161.07 | b7b1781 |
5 | Read file in chunks of 100 MB instead of reading line by line. | 3:32.62 | -9.14 | c26fea4 |
6 | Convert temperature from string to int64 , process in int64 and convert to float64 at the end. |
2:51.50 | -41.14 | 7812da4 |
7 | In the city <> temperatures map, replaced the value for each key (city) to preprocessed min, max, count and sum of all temperatures instead of storing all recorded temperatures for the city. | 1:39.81 | -71.79 | e5213a8 |
8 | Use producer consumer pattern to read file in chunks and process the chunks in parallel. | 1:43.82 | +14.01 | 067f2a4 |
9 | Reduce memory allocation by processing each read chunk into a map. Result channel now can collate the smaller processed chunk maps. | 0:28.544 | -75.286 | d4153ac |
10 | Avoid string concatenation overhead by not reading the decimal point when processing city temperature. | 0:24.571 | -3.973 | 90f2fe1 |
11 | Convert byte slice to string directly instead of using a strings.Builder . |
0:18.910 | -5.761 | 88bb6da |
12 | Replace strconv.ParseInt with a custom string to int parser. |
0:14.008 | -4.902 | 17d575f |
13 | Reduce map access calls when constructing final result string. | 0:12.017 | -1.9991 |