/rollup

Rollup time-series data in Rails

Primary LanguageRubyMIT LicenseMIT

Rollup

🔥 Rollup time-series data in Rails

Works great with Ahoy and Searchjoy

Build Status

Installation

Add this line to your application’s Gemfile:

gem "rollups"

For Rails < 6, also add:

gem "activerecord-import"

And run:

bundle install
rails generate rollups
rails db:migrate

Contents

Getting Started

Store the number of users created by day in the rollups table

User.rollup("New users")

Get the series

Rollup.series("New users")
# {
#   Sat, 24 May 2020 => 50,
#   Sun, 25 May 2020 => 100,
#   Mon, 26 May 2020 => 34
# }

Use a rake task or background job to create rollups on a regular basis. Don’t worry too much about naming - you can rename later if needed.

Creating Rollups

Time Column

Specify the time column - created_at by default

User.rollup("New users", column: :joined_at)

Change the default column for a model

class User < ApplicationRecord
  self.rollup_column = :joined_at
end

Time Intervals

Specify the interval - day by default

User.rollup("New users", interval: "week")

And when querying

Rollup.series("New users", interval: "week")

Supported intervals are:

  • hour
  • day
  • week
  • month
  • quarter
  • year

Or any number of minutes or seconds:

  • 1m, 5m, 15m
  • 1s, 30s, 90s

Weeks start on Sunday by default. Change this with:

Rollup.week_start = :monday

Time Zones

The default time zone is Time.zone. Change this with:

Rollup.time_zone = "Pacific Time (US & Canada)"

or

User.rollup("New users", time_zone: "Pacific Time (US & Canada)")

Time zone objects also work. To see a list of available time zones in Rails, run rake time:zones:all.

See date storage for how dates are stored.

Calculations

Rollups use count by default. For other calculations, use:

Order.rollup("Revenue") { |r| r.sum(:revenue) }

Works with count, sum, minimum, maximum, and average. For median and percentile, check out ActiveMedian.

Dimensions

PostgreSQL only

Create rollups with dimensions

Order.group(:platform).rollup("Orders by platform")

Works with multiple groups as well

Order.group(:platform, :channel).rollup("Orders by platform and channel")

Dimension names are determined by the group clause. To set manually, use:

Order.group(:channel).rollup("Orders by source", dimension_names: ["source"])

See how to query dimensions.

Updating Data

When you run a rollup for the first time, the entire series is calculated. When you run it again, newer data is added.

By default, the latest interval stored for a series is recalculated, since it was likely calculated before the interval completed. Earlier intervals aren’t recalculated since the source rows may have been deleted (this also improves performance).

To recalculate the last few intervals, use:

User.rollup("New users", last: 3)

To recalculate a time range, use:

User.rollup("New users", range: 1.week.ago.all_week)

To only store data for completed intervals, use:

User.rollup("New users", current: false)

To clear and recalculate the entire series, use:

User.rollup("New users", clear: true)

To delete a series, use:

Rollup.where(name: "New users", interval: "day").delete_all

Querying Rollups

Single Series

Get a series

Rollup.series("New users")

Specify the interval if it’s not day

Rollup.series("New users", interval: "week")

If a series has dimensions, they must match exactly as well

Rollup.series("Orders by platform and channel", dimensions: {platform: "Web", channel: "Search"})

Multiple Series

PostgreSQL only

Get multiple series grouped by dimensions

Rollup.multi_series("Orders by platform")

Specify the interval if it’s not day

Rollup.multi_series("Orders by platform", interval: "week")

Filter by dimensions

Rollup.where_dimensions(platform: "Web").multi_series("Orders by platform and channel")

Raw Data

Uses the Rollup model to query the data directly

Rollup.where(name: "New users", interval: "day")

List

List names and intervals

Rollup.list

Charts

Rollup works great with Chartkick

<%= line_chart Rollup.series("New users") %>

For multiple series, set a name for each series before charting

series = Rollup.multi_series("Orders by platform")
series.each do |s|
  s[:name] = s[:dimensions]["platform"]
end

Other Topics

Naming

Use any naming convention you prefer. Some ideas are:

  • Human - New users
  • Underscore - new_users
  • Dots - new_users.count

Rename with:

Rollup.rename("Old name", "New name")

Date Storage

Rollup stores both dates and times in the time column depending on the interval. For date intervals (day, week, etc), it stores 00:00:00 for the time part. Cast the time column to a date when querying in SQL to get the correct value.

  • PostgreSQL: time::date
  • MySQL: CAST(time AS date)
  • SQLite: date(time)

Examples

Ahoy

Set the default rollup column for your models

class Ahoy::Visit < ApplicationRecord
  self.rollup_column = :started_at
end

and

class Ahoy::Event < ApplicationRecord
  self.rollup_column = :time
end

Hourly visits

Ahoy::Visit.rollup("Visits", interval: "hour")

Visits by browser

Ahoy::Visit.group(:browser).rollup("Visits by browser")

Unique homepage views

Ahoy::Event.where(name: "Viewed homepage").joins(:visit).rollup("Homepage views") { |r| r.distinct.count(:visitor_token) }

Product views

Ahoy::Event.where(name: "Viewed product").group_prop(:product_id).rollup("Product views")

Searchjoy

Daily searches

Searchjoy::Search.rollup("Searches")

Searches by query

Searchjoy::Search.group(:normalized_query).rollup("Searches by query", dimension_names: ["query"])

Conversion rate

Searchjoy::Search.rollup("Search conversion rate") { |r| r.average("(converted_at IS NOT NULL)::int") }

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/rollup.git
cd rollup
bundle install

# create databases
createdb rollup_test
mysqladmin create rollup_test

# run tests
bundle exec rake test