/blazer

Business intelligence made simple

Primary LanguageRubyMIT LicenseMIT

Blazer

Explore your data with SQL. Easily create charts and dashboards, and share them with your team.

Try it out

Screenshot

Blazer is also available as a Docker image.

🍊 Battle-tested at Instacart

Features

  • Multiple data sources - PostgreSQL, MySQL, Redshift, and many more
  • Variables - run the same queries with different values
  • Checks & alerts - get emailed when bad data appears
  • Audits - all queries are tracked
  • Security - works with your authentication system

Docs

Installation

Add this line to your application’s Gemfile:

gem 'blazer'

Run:

rails generate blazer:install
rails db:migrate

And mount the dashboard in your config/routes.rb:

mount Blazer::Engine, at: "blazer"

For production, specify your database:

ENV["BLAZER_DATABASE_URL"] = "postgres://user:password@hostname:5432/database"

Blazer tries to protect against queries which modify data (by running each query in a transaction and rolling it back), but a safer approach is to use a read only user. See how to create one.

Checks (optional)

Be sure to set a host in config/environments/production.rb for emails to work.

config.action_mailer.default_url_options = {host: "blazer.dokkuapp.com"}

Schedule checks to run (with cron, Heroku Scheduler, etc). The default options are every 5 minutes, 1 hour, or 1 day, which you can customize. For each of these options, set up a task to run.

rake blazer:run_checks SCHEDULE="5 minutes"
rake blazer:run_checks SCHEDULE="1 hour"
rake blazer:run_checks SCHEDULE="1 day"

You can also set up failing checks to be sent once a day (or whatever you prefer).

rake blazer:send_failing_checks

Here’s what it looks like with cron.

*/5 * * * * rake blazer:run_checks SCHEDULE="5 minutes"
0   * * * * rake blazer:run_checks SCHEDULE="1 hour"
30  7 * * * rake blazer:run_checks SCHEDULE="1 day"
0   8 * * * rake blazer:send_failing_checks

For Slack notifications, create an incoming webhook and set:

BLAZER_SLACK_WEBHOOK_URL=https://hooks.slack.com/...

Name the webhook “Blazer” and add a cool icon.

Authentication

Don’t forget to protect the dashboard in production.

Basic Authentication

Set the following variables in your environment or an initializer.

ENV["BLAZER_USERNAME"] = "andrew"
ENV["BLAZER_PASSWORD"] = "secret"

Devise

authenticate :user, ->(user) { user.admin? } do
  mount Blazer::Engine, at: "blazer"
end

Other

Specify a before_action method to run in blazer.yml.

before_action_method: require_admin

You can define this method in your ApplicationController.

def require_admin
  # depending on your auth, something like...
  redirect_to root_path unless current_user && current_user.admin?
end

Be sure to render or redirect for unauthorized users.

Permissions

Blazer runs each query in a transaction and rolls it back to prevent queries from modifying data. As an additional line of defense, we recommend using a read only user.

PostgreSQL

Create a user with read only permissions:

BEGIN;
CREATE ROLE blazer LOGIN PASSWORD 'secret123';
GRANT CONNECT ON DATABASE database_name TO blazer;
GRANT USAGE ON SCHEMA public TO blazer;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO blazer;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO blazer;
COMMIT;

MySQL

Create a user with read only permissions:

GRANT SELECT, SHOW VIEW ON database_name.* TO blazer@’127.0.0.1′ IDENTIFIED BY ‘secret123‘;
FLUSH PRIVILEGES;

MongoDB

Create a user with read only permissions:

db.createUser({user: "blazer", pwd: "password", roles: ["read"]})

Also, make sure authorization is enabled when you start the server.

Sensitive Data

If your database contains sensitive or personal data, check out Hypershield to shield it.

Encrypted Data

If you need to search encrypted data, use blind indexing.

You can have Blazer transform specific variables with:

Blazer.transform_variable = lambda do |name, value|
  value = User.generate_email_bidx(value) if name == "email_bidx"
  value
end

Queries

Variables

Create queries with variables.

SELECT * FROM users WHERE gender = {gender}

Use {start_time} and {end_time} for time ranges. Example

SELECT * FROM ratings WHERE rated_at >= {start_time} AND rated_at <= {end_time}

Smart Variables

Example

Suppose you have the query:

SELECT * FROM users WHERE occupation_id = {occupation_id}

Instead of remembering each occupation’s id, users can select occupations by name.

Add a smart variable with:

smart_variables:
  occupation_id: "SELECT id, name FROM occupations ORDER BY name ASC"

The first column is the value of the variable, and the second column is the label.

You can also use an array or hash for static data and enums.

smart_variables:
  period: ["day", "week", "month"]
  status: {0: "Active", 1: "Archived"}

Linked Columns

Example - title column

Link results to other pages in your apps or around the web. Specify a column name and where it should link to. You can use the value of the result with {value}.

linked_columns:
  user_id: "/admin/users/{value}"
  ip_address: "https://www.infosniper.net/index.php?ip_address={value}"

Smart Columns

Example - occupation_id column

Suppose you have the query:

SELECT name, city_id FROM users

See which city the user belongs to without a join.

smart_columns:
  city_id: "SELECT id, name FROM cities WHERE id IN {value}"

You can also use a hash for static data and enums.

smart_columns:
  status: {0: "Active", 1: "Archived"}

Caching

Blazer can automatically cache results to improve speed. It can cache slow queries:

cache:
  mode: slow
  expires_in: 60 # min
  slow_threshold: 15 # sec

Or it can cache all queries:

cache:
  mode: all
  expires_in: 60 # min

Of course, you can force a refresh at any time.

Charts

Blazer will automatically generate charts based on the types of the columns returned in your query.

Note: The order of columns matters.

Line Chart

There are two ways to generate line charts.

2+ columns - timestamp, numeric(s) - Example

SELECT date_trunc('week', created_at), COUNT(*) FROM users GROUP BY 1

3 columns - timestamp, string, numeric - Example

SELECT date_trunc('week', created_at), gender, COUNT(*) FROM users GROUP BY 1, 2

Column Chart

There are also two ways to generate column charts.

2+ columns - string, numeric(s) - Example

SELECT gender, COUNT(*) FROM users GROUP BY 1

3 columns - string, string, numeric - Example

SELECT gender, zip_code, COUNT(*) FROM users GROUP BY 1, 2

Scatter Chart

2 columns - both numeric - Example

SELECT x, y FROM table

Pie Chart

2 columns - string, numeric - and last column named pie - Example

SELECT gender, COUNT(*) AS pie FROM users GROUP BY 1

Maps

Columns named latitude and longitude or lat and lon or lat and lng - Example

SELECT name, latitude, longitude FROM cities

To enable, get an access token from Mapbox and set ENV["MAPBOX_ACCESS_TOKEN"].

Targets

Use the column name target to draw a line for goals. Example

SELECT date_trunc('week', created_at), COUNT(*) AS new_users, 100000 AS target FROM users GROUP BY 1

Dashboards

Create a dashboard with multiple queries. Example

If the query has a chart, the chart is shown. Otherwise, you’ll see a table.

If any queries have variables, they will show up on the dashboard.

Checks

Checks give you a centralized place to see the health of your data. Example

Create a query to identify bad rows.

SELECT * FROM ratings WHERE user_id IS NULL /* all ratings should have a user */

Then create check with optional emails if you want to be notified. Emails are sent when a check starts failing, and when it starts passing again.

Cohorts

Create a cohort analysis from a simple SQL query. Example

Create a query with the comment /* cohort analysis */. The result should have columns named user_id and conversion_time and optionally cohort_time.

You can generate cohorts from the first conversion time:

/* cohort analysis */
SELECT user_id, created_at AS conversion_time FROM orders

(the first conversion isn’t counted in the first time period with this format)

Or from another time, like sign up:

/* cohort analysis */
SELECT users.id AS user_id, orders.created_at AS conversion_time, users.created_at AS cohort_time
FROM users LEFT JOIN orders ON orders.user_id = users.id

This feature requires PostgreSQL.

Anomaly Detection

Blazer supports two different approaches to anomaly detection.

Trend

Trend is easiest to set up. By default, it uses an external service, but you can run it on your own infrastructure as well.

Add trend to your Gemfile:

gem 'trend'

And add to config/blazer.yml:

anomaly_checks: trend

For the self-hosted API, create an initializer with:

Trend.url = "http://localhost:8000"

R

R is harder to set up but doesn’t use an external service. It uses Twitter’s AnomalyDetection library.

First, install R. Then, run:

install.packages("remotes")
remotes::install_github("twitter/AnomalyDetection")

And add to config/blazer.yml:

anomaly_checks: r

If upgrading from version 1.4 or below, also follow the upgrade instructions.

If you’re on Heroku, follow the additional instructions below.

R on Heroku

Add the R buildpack to your app.

heroku buildpacks:add --index 1 https://github.com/virtualstaticvoid/heroku-buildpack-r.git

And create an init.R with:

if (!"AnomalyDetection" %in% installed.packages()) {
  install.packages("remotes")
  remotes::install_github("twitter/AnomalyDetection")
}

Commit and deploy away. The first deploy may take a few minutes.

Forecasting

Blazer supports for two different forecasting methods. Example

A forecast link will appear for queries that return 2 columns with types timestamp and numeric.

Prophet

Add prophet to your Gemfile:

gem 'prophet-rb', '>= 0.2.1'

And add to config/blazer.yml:

forecasting: prophet

Trend

Trend uses an external service.

Add trend to your Gemfile:

gem 'trend'

And add to config/blazer.yml:

forecasting: trend

Uploads

Creating database tables from CSV files. Example

Run:

rails generate blazer:uploads
rails db:migrate

And add to config/blazer.yml:

uploads:
  url: postgres://...
  schema: uploads
  data_source: main

This feature requires PostgreSQL. Create a new schema just for uploads.

CREATE SCHEMA uploads;

Data Sources

Blazer supports multiple data sources 🎉

Add additional data sources in config/blazer.yml:

data_sources:
  main:
    url: <%= ENV["BLAZER_DATABASE_URL"] %>
    # timeout, smart_variables, linked_columns, smart_columns
  catalog:
    url: <%= ENV["CATALOG_DATABASE_URL"] %>
    # ...
  redshift:
    url: <%= ENV["REDSHIFT_DATABASE_URL"] %>
    # ...

Full List

You can also create an adapter for any other data store.

Note: In the examples below, we recommend using environment variables for urls.

data_sources:
  my_source:
    url: <%= ENV["BLAZER_MY_SOURCE_URL"] %>

Amazon Athena

Add aws-sdk-athena and aws-sdk-glue to your Gemfile and set:

data_sources:
  my_source:
    adapter: athena
    database: database
    output_location: s3://some-bucket/

Amazon Redshift

Add activerecord6-redshift-adapter or activerecord5-redshift-adapter to your Gemfile and set:

data_sources:
  my_source:
    url: redshift://user:password@hostname:5439/database

Apache Drill

Add drill-sergeant to your Gemfile and set:

data_sources:
  my_source:
    adapter: drill
    url: http://hostname:8047

Cassandra

Add cassandra-driver to your Gemfile and set:

data_sources:
  my_source:
    url: cassandra://user:password@hostname:9042/keyspace

Druid

Enable SQL support on the broker and set:

data_sources:
  my_source:
    adapter: druid
    url: http://hostname:8082

Elasticsearch

Add elasticsearch and elasticsearch-xpack to your Gemfile and set:

data_sources:
  my_source:
    adapter: elasticsearch
    url: http://user:password@hostname:9200

Google BigQuery

Add google-cloud-bigquery to your Gemfile and set:

data_sources:
  my_source:
    adapter: bigquery
    project: your-project
    keyfile: path/to/keyfile.json

IBM DB2 and Informix

Add ibm_db to your Gemfile and set:

data_sources:
  my_source:
    url: ibm-db://user:password@hostname:50000/database

InfluxDB

Add influxdb to your Gemfile and set:

data_sources:
  my_source:
    adapter: influxdb
    url: http://user:password@hostname:8086/database

Supports InfluxQL

MongoDB

Add mongo to your Gemfile and set:

data_sources:
  my_source:
    url: mongodb://user:password@hostname:27017/database

MySQL

Add mysql2 to your Gemfile (if it’s not there) and set:

data_sources:
  my_source:
    url: mysql2://user:password@hostname:3306/database

Neo4j

Add neo4j-core to your Gemfile and set:

data_sources:
  my_source:
    adapter: neo4j
    url: http://user:password@hostname:7474

Oracle

Add activerecord-oracle_enhanced-adapter and ruby-oci8 to your Gemfile and set:

data_sources:
  my_source:
    url: oracle-enhanced://user:password@hostname:1521/database

PostgreSQL

Add pg to your Gemfile (if it’s not there) and set:

data_sources:
  my_source:
    url: postgres://user:password@hostname:5432/database

Presto

Add presto-client to your Gemfile and set:

data_sources:
  my_source:
    url: presto://user@hostname:8080/catalog

Salesforce

Add restforce to your Gemfile and set:

data_sources:
  my_source:
    adapter: salesforce

And set the appropriate environment variables:

SALESFORCE_USERNAME="username"
SALESFORCE_PASSWORD="password"
SALESFORCE_SECURITY_TOKEN="security token"
SALESFORCE_CLIENT_ID="client id"
SALESFORCE_CLIENT_SECRET="client secret"
SALESFORCE_API_VERSION="41.0"

Supports SOQL

Socrata Open Data API (SODA)

Set:

data_sources:
  my_source:
    adapter: soda
    url: https://soda.demo.socrata.com/resource/4tka-6guv.json
    app_token: ...

Supports SoQL

Snowflake

First, install ODBC. For Homebrew, use:

brew install unixodbc

For Ubuntu, use:

sudo apt-get install unixodbc-dev

For Heroku, use the Apt buildpack and create an Aptfile with:

unixodbc-dev
https://sfc-repo.snowflakecomputing.com/odbc/linux/2.21.5/snowflake-odbc-2.21.5.x86_64.deb

This installs the driver at /app/.apt/usr/lib/snowflake/odbc/lib/libSnowflake.so

Then, download the Snowflake ODBC driver. Add odbc_adapter to your Gemfile and set:

data_sources:
  my_source:
    adapter: snowflake
    conn_str: Driver=/path/to/libSnowflake.so;uid=user;pwd=password;server=host.snowflakecomputing.com

SQLite

Add sqlite3 to your Gemfile and set:

data_sources:
  my_source:
    url: sqlite3:path/to/database.sqlite3

SQL Server

Add tiny_tds and activerecord-sqlserver-adapter to your Gemfile and set:

data_sources:
  my_source:
    url: sqlserver://user:password@hostname:1433/database

Creating an Adapter

Create an adapter for any data store with:

class FooAdapter < Blazer::Adapters::BaseAdapter
  # code goes here
end

Blazer.register_adapter "foo", FooAdapter

See the Presto adapter for a good example. Then use:

data_sources:
  my_source:
    adapter: foo
    url: http://user:password@hostname:9200/

Query Permissions

Blazer supports a basic permissions model.

  1. Queries without a name are unlisted
  2. Queries whose name starts with # are only listed to the creator
  3. Queries whose name starts with * can only be edited by the creator

Learn SQL

Have team members who want to learn SQL? Here are a few great, free resources.

Useful Tools

For an easy way to group by day, week, month, and more with correct time zones, check out Groupdate.sql.

Standalone Version

Looking for a standalone version? Check out Ghost Blazer.

Performance

By default, queries take up a request while they are running. To run queries asynchronously, add to your config:

async: true

Note: Requires Rails 5+ and caching to be enabled. If you have multiple web processes, your app must use a centralized cache store like Memcached or Redis.

config.cache_store = :mem_cache_store

Archiving

Archive queries that haven’t been viewed in over 90 days.

rake blazer:archive_queries

Content Security Policy

If views are stuck with a Loading... message, there might be a problem with strict CSP settings in your app. This can be checked with Firefox or Chrome dev tools. You can allow Blazer to override these settings for its controllers with:

override_csp: true

Upgrading

2.3

To archive queries, create a migration

rails g migration add_status_to_blazer_queries

with:

add_column :blazer_queries, :status, :string
Blazer::Query.update_all(status: "active")

2.0

To use Slack notifications, create a migration

rails g migration add_slack_channels_to_blazer_checks

with:

add_column :blazer_checks, :slack_channels, :text

History

View the changelog

Thanks

Blazer uses a number of awesome open source projects, including Rails, Vue.js, jQuery, Bootstrap, Selectize, StickyTableHeaders, Stupid jQuery Table Sort, and Date Range Picker.

Demo data from MovieLens.

Want to Make Blazer Better?

That’s awesome! Here are a few ways you can help:

Check out the dev app to get started.