/distribute_reads

Scale database reads to replicas in Rails

Primary LanguageRubyMIT LicenseMIT

Distribute Reads

Scale database reads to replicas in Rails

🍊 Battle-tested at Instacart

Build Status

Installation

Add this line to your application’s Gemfile:

gem 'distribute_reads'

How to Use

Makara does most of the work. First, update database.yml to use it:

default: &default
  url: postgresql-makara:///
  makara:
    sticky: true
    connections:
      - role: master
        name: primary
        url: <%= ENV["DATABASE_URL"] %>
      - name: replica
        url: <%= ENV["REPLICA_DATABASE_URL"] %>

development:
  <<: *default

production:
  <<: *default

Note: You can use the same instance for the primary and replica in development.

By default, all reads go to the primary instance. To use the replica, do:

distribute_reads { User.count }

Works with multiple queries as well.

distribute_reads do
  User.find_each do |user|                 # replica
    user.orders_count = user.orders.count  # replica
    user.save!                             # primary
  end
end

Jobs

Distribute all reads in a job with:

class TestJob < ApplicationJob
  distribute_reads

  def perform
    # ...
  end
end

You can pass any options as well.

Lazy Evaluation

ActiveRecord uses lazy evaluation, which can delay the execution of a query to outside of a distribute_reads block. In this case, the primary will be used.

users = distribute_reads { User.where(orders_count: 1) } # not executed yet

Call to_a inside the block ensure the query runs on a replica.

users = distribute_reads { User.where(orders_count: 1).to_a }

Options

Replica Lag

Raise an error when replica lag is too high (specified in seconds)

distribute_reads(max_lag: 3) do
  # raises DistributeReads::TooMuchLag
end

Instead of raising an error, you can also use primary

distribute_reads(max_lag: 3, lag_failover: true) do
  # ...
end

If you have multiple databases, this only checks lag on ActiveRecord::Base connection. Specify connections to check with

distribute_reads(max_lag: 3, lag_on: [ApplicationRecord, LogRecord]) do
  # ...
end

Note: If lag on any connection exceeds the max lag and lag failover is used, all connections will use their primary.

Availability

If no replicas are available, primary is used. To prevent this situation from overloading the primary, you can raise an error instead.

distribute_reads(failover: false) do
  # raises DistributeReads::NoReplicasAvailable
end

Default Options

Change the defaults

DistributeReads.default_options = {
  lag_failover: true,
  failover: false
}

Distribute Reads by Default

At some point, you may wish to distribute reads by default.

DistributeReads.by_default = true

To make queries go to primary, use:

distribute_reads(primary: true) do
  # ...
end

Reference

Get replication lag in seconds

DistributeReads.replication_lag

Thanks

Thanks to TaskRabbit for Makara, Sherin Kurian for the max lag option, and Nick Elser for the write-through cache.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To test, run:

git clone https://github.com/ankane/distribute_reads.git
cd distribute_reads
createdb distribute_reads_test_primary
createdb distribute_reads_test_replica
bundle
bundle exec rake