Cache your static-ish models, expire them appropriately, and retrieve them efficiently.
It's the fastest way. At least, that I know of.
There's a longer short-story at the end of this readme, but effectively we wanted to reduce the total number of connections made for relatively static objects in a very-high-throughput environment while maintaining parity across multiple independent services. A delightful, unintended consequence is that this happens to be extremely performant.
Do you need it? Unsure. Do you want it? Maybe.
It's like IdentityCache
on steroids.
The most basic example:
class Business < ApplicationRecord
has_finder_cache # creates class method `id_finder_cache`
end
Business.id_finder_cache.find(1) # <Business id="1" name="One fine business">
Slightly more advanced:
class Category < ApplicationRecord
has_finder_cache :slug, finds_by: :slug
end
Category.slug_finder_cache.find("funny") # <Category id="1" slug="funny" name="Funny stories">
And it gets much more configurable from there, but you get it.
$ bundle add finder_cache
Have fun.
Just set enabled
to false in the configuration:
FinderCache.setup do |config|
config.enabled = false
end
And then you'll hit the database every time you invoke #find
on a cache.
By default, the setup behavior works like this:
- Query the model for records
- Set the in-memory cache version to
Time.now
- Index the records by
:id
(finds_by
in configuration)
When you perform an operation on a cache:
- Check cache validity: is the cache's value is equal to our
cache_key
value? a. throttle this check: only hit the cache every 60 seconds to not hammer the cache (ttl
) - If not valid, rebuild the collection of records
- Return the value of the collection based on the
key
sent a. FinderCache will try to cast the value sent appropriately b. if it's not found, FinderCache will try to query the model for it based onfinds_by
andscope
); it will return this to the collection cache as well so if it tries to get looked up again, it won't be a cache miss
We do 3a to not hammer your cache. FinderCache was originally built to withstand 500 operations per second on a startup's pre-Series-A budget during a period where it was tough to fundraise.
Otherwise lazy-loaded:
Category.id_finder_cache.load!
Now that you know the behavior, let's customize it a bit:
How long between calls do we check the cache. Set to 0
to disable the throttle mechanism which checks the version of your in-memory cache:
class Category < ApplicationRecord
has_finder_cache ttl: 0.seconds
end
Must quack like ActiveSupport::Cache::Store
.
FinderCacheStore = ActiveSupport::Cache::RedisCacheStore.new(url: ENV['REDIS_FINDER_CACHE_URL'])
class Category < ApplicationRecord
has_finder_cache cache: FinderCacheStore # ur so nice to give it its own cache store :')
end
class Category < ApplicationRecord
scope :active, -> { where(active: true) }
has_finder_cache scope: :active
end
How long is the built collection of records is valid for.
class Category < ApplicationRecord
has_finder_cache expires_in: 24.hours
end
There are side-effects to this: the records won't be lazily-looked-up if they don't exist in the collection cache, and we can't cast the index keys.
class Category < ApplicationRecord
has_finder_cache do
Category.all.index_by(&:something_cool)
end
end
FinderCacheStore = ActiveSupport::Cache::RedisCacheStore.new(url: ENV['REDIS_FINDER_CACHE_URL'])
class Category < ApplicationRecord
scope :active, -> { where(active: true) }
has_finder_cache cache: FinderCacheStore, scope: :active, ttl: 3.seconds, expires_in: 24.hours
end
Can be done in an initializer (defaults shown):
FinderCache.setup do |config|
config.ttl = 60.seconds
config.expires_in = 1.hour
config.cache = Rails.cache
config.finds_by = :id
end
It was a bright and sunny day outside when the error Slack channel shit the bed: ActiveRecord::ConnectionPool
was littered across my screen, even though we had a connection pool of 500. Same goes for our in-memory data structure Redis. Le sigh.
I spent some time looking into it and ultimately our background processor was chewing through jobs at a high rate — a good problem to have. I could have upgraded the RDBMS to the next tier but that was equal to a developer salary so no: this startup was runway-sensitive, I had to look for other options. Ultimately, I had to do one of the hardest thing in computer science: caching things.
I knew that whatever mechanism that would hold this cache had to be shared as close as realistically possible between our services (web server, background jobs), so I couldn't just memoize a collection of objects and index them by some key in a hash: if the cache expired in one process, it had to in the other. A pub-sub mechanism was explored but only explored since it relied on an external dependency.
After some discussion the team's gut instinct was to marshal the objects and put them into some data store and fetch the necessary record from there. That'd be okay for low-volume, but based on my decade-plus of experience creating bugs I knew that would be impractical for our setup.
It's hella-performant.
If you're worried about memory footprint, it's extremely light.
More results can be found in scripts/benchmark.csv.
Contribution directions go here.
The gem is available as open source under the terms of the MIT License.