Simple-data-analysis benchmarks

To test the performance of simple-data-analysis@2.0.1, we calculated the average temperature per decade and city with the daily temperatures from the Adjusted and Homogenized Canadian Climate Data.

We ran the same calculations with simple-data-analysis@1.8.1 (both NodeJS and Bun), Pandas (Python), and the tidyverse (R).

In each script, we:

Load a CSV file (Importing)
Select four columns, remove rows with missing temperature, convert date strings to date and temperature strings to float (Cleaning)
Add a new column decade and calculate the decade (Modifying)
Calculate the average temperature per decade and city (Summarizing)
Write the cleaned-up data that we computed the averages from in a new CSV file (Writing)

Each script has been run ten times on a MacBook Pro (Apple M1 Pro / 16 GB), and the durations have been averaged.

The charts displayed below come from this Observable notebook.

Small file

With ahccd-samples.csv:

74.7 MB
19 cities
20 columns
971,804 rows
19,436,080 data points

As we can see, simple-data-analysis@1.8.1 was the slowest, but simple-data-analysis@2.0.1 is now the fastest.

Big file

With ahccd.csv:

1.7 G
773 cities
20 columns
22,051,025 rows
441,020,500 data points

The file was too big for simple-data-analysis@1.8.1, so it's not included here.

Again, simple-data-analysis@2.0.1 is now the fastest option.

nshiab/simple-data-analysis-benchmarks

Simple-data-analysis benchmarks

Small file

Big file