/1trc

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

One Trillion Row Challenge

Inspired by Gunnar Morling's one billion row challenge, we thought we'd take things one step further and start the one trillion row challenge (1TRC).

Data Generation

You can generate the dataset yourself using the data generation script, adapted from Jacob Tomlinson's data generation script. We've also hosted the dataset in a requester pays S3 bucket s3://coiled-datasets-rp/1trc in us-east-1.

It draws a random sample of weather stations and normally distributed temperatures drawn from the mean for each station based on the values in lookup.csv.

The Challenge

The main task, like the 1BRC, is to calculate the min, mean, and max values per weather station, sorted alphabetically.