👉 Send every web request and model update to BigQuery
✋ Skip or anonymise fields containing PII
✌️ Configure and forget
This gem provides an opionated integration with Google BigQuery.
Once it is set up, every web request and database update (as permitted by configuration) will flow to BigQuery.
It also provides a Rake task for backfilling BigQuery with models created before you started sending events (see Importing existing data below), and one for keeping your field configuration up to date.
To set the gem up follow the steps in "Configuration", below.
A Rails model is an analytics Entity.
A change to a model (including creation and deletion) is an analytics Event. When a model changes we send the entire new state of the model as part of the event.
A web request is also an analytics Event.
sequenceDiagram
participant Client
participant Analytics middleware
participant Controller
participant Model
participant RequestStore
Client->>+Controller: GET /index
activate Controller
Analytics middleware-->>RequestStore: Store request UUID
Controller->>Model: Update model
Model->>Analytics: after_update hook
Analytics-->>RequestStore: Retrieve request UUID
Analytics->>ActiveJob: enqueue Event with serialized model state and request UUID
Controller->>Analytics: after_action to send request event
Analytics->>ActiveJob: enqueue Event with serialized request and request UUID
Controller->>Client: 200 OK
deactivate Controller
ActiveJob->>ActiveJob: pump serialized Events to BigQuery
A Rails app with ActiveJob
configured.
gem 'dfe-analytics'
then
bundle install
bundle exec rails generate dfe:analytics:install
and follow comments in config/initializers/dfe-analytics.yml
.
The dfe:analytics:install
generator will also initialize some empty config files:
Filename | Purpose |
---|---|
config/analytics.yml |
List all fields we will send to BigQuery |
config/analytics_pii.yml |
List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in analytics.yml |
config/analytics_blocklist.yml |
Autogenerated file to list all fields we will NOT send to BigQuery, to support the analytics:check task |
A good place to start is to run
bundle exec rails dfe:analytics:regenerate_blocklist
to populate analytics_blocklist.yml
. Work through this file to move entries
into analytics.yml
and optionally also to analytics_pii.yml
.
Finally, run
bundle exec rails dfe:analytics:check
This will let you know whether there are any fields in your field configuration which are present in the model but missing from the config, or present in the config but missing from the model.
It's recommended to run this task regularly - at least as often as you run database migrations. Consider enhancing db:migrate to run it automatically.
Mix in the following modules. It's recommended to include them at the
highest possible level in the inheritance hierarchy of your controllers and
models so that they are effective everywhere. A standard Rails application will
have all controllers inheriting from ApplicationController
and all models
inheriting from ApplicationRecord
, so these should be a good place to start.
class ApplicationController < ActionController::Base
include DfE::Analytics::Requests
# This method MUST be present in your controller and should return
# either nil or an object implementing an .id method.
#
# def current_user; end
# This method MAY be present in your controller. If so, it should
# return a string - return value will be attached to web_request events.
#
# def current_namespace; end
end
class ApplicationRecord < ActiveRecord::Base
include DfE::Analytics::Entities
end
If everything has worked, you should see jobs flowing into your queues on each
web request and model update. While you’re setting things up consider setting
the config options async: false
and log_only: true
to take ActiveJob and
BigQuery (respectively) out of the loop.
Run
bundle exec rails dfe:analytics:import_all_entities
To reimport just one model, run:
bundle exec rails dfe:analytics:import_entity[ModelName]
Make a copy of this repository, run bundle install
, then bundle exec rspec
to run the tests.
The gem is available as open source under the terms of the MIT License.