/scrooge

Fetch exactly what you need

Primary LanguageRuby

UPDATED

Please head on over to Stephen Sykes’s more recent implementation slim_scrooge – DB agnostic,
leaner and a native MRI callsite hash to negate Kernel#caller overheads

This is a complete rewrite from the initial coverage at igvita.com – read on below

Many thanks to Stephen Sykes ( pennysmalls.com ) for his time spent on shaping, implementing and troubleshooting this release.

An ActiveRecord optimization layer to ensure production Ruby applications only fetch the database content needed to minimize wire traffic, excessive SQL queries and reduce conversion overheads to native Ruby types.

Why bother ?

  • Object conversion and moving unnecessary data is both expensive and tax existing infrastructure in high load setups
  • Manually extracting and scoping SELECT clauses is not sustainable in a clean and painless manner with iterative development, even less so in large projects.
  • Preloading associations can be painful – delegate to Scrooge instead.

What it does


	Processing HotelsController#show (for 127.0.0.1 at 2009-03-18 19:29:38) [GET]
	  Parameters: {"action"=>"show", "id"=>"8699-radisson-hotel-waterfront-cape-town", "controller"=>"hotels"}
	  Hotel Load Scrooged (0.3ms)   SELECT `hotels`.id FROM `hotels` WHERE (`hotels`.`id` = 8699) 
	Rendering template within layouts/application
	Rendering hotels/show
	  SQL (0.2ms)   SELECT `hotels`.location_id,`hotels`.hotel_name,`hotels`.location,`hotels`.from_price,`hotels`.star_rating,`hotels`.apt,`hotels`.latitude,`hotels`.longitude,`hotels`.distance,`hotels`.narrative,`hotels`.telephone,`hotels`.important_notes,`hotels`.nearest_tube,`hotels`.nearest_rail,`hotels`.created_at,`hotels`.updated_at,`hotels`.id FROM `hotels` WHERE `hotels`.id IN ('8699')
	  Image Load Scrooged (0.2ms)   SELECT `images`.id FROM `images` WHERE (`images`.hotel_id = 8699) LIMIT 1
	  SQL (0.2ms)   SELECT `images`.hotel_id,`images`.title,`images`.url,`images`.width,`images`.height,`images`.thumbnail_url,`images`.thumbnail_width,`images`.thumbnail_height,`images`.has_thumbnail,`images`.created_at,`images`.updated_at,`images`.id FROM `images` WHERE `images`.id IN ('488')
	Rendered shared/_header (0.0ms)
	Rendered shared/_navigation (0.2ms)
	  Image Load Scrooged (0.2ms)   SELECT `images`.id FROM `images` WHERE (`images`.hotel_id = 8699) 
	  SQL (0.2ms)   SELECT `images`.hotel_id,`images`.title,`images`.url,`images`.width,`images`.height,`images`.thumbnail_url,`images`.thumbnail_width,`images`.thumbnail_height,`images`.has_thumbnail,`images`.created_at,`images`.updated_at,`images`.id FROM `images` WHERE `images`.id IN ('488')
	  Address Columns (306.2ms)   SHOW FIELDS FROM `addresses`
	  Address Load Scrooged (3.6ms)   SELECT `addresses`.id FROM `addresses` WHERE (`addresses`.hotel_id = 8699) LIMIT 1
	Rendered hotels/_show_sidebar (313.2ms)
	Rendered shared/_footer (0.1ms)
	Completed in 320ms (View: 8, DB: 311) | 200 OK [http://localhost/hotels/8699-radisson-hotel-waterfront-cape-town]


	Processing HotelsController#show (for 127.0.0.1 at 2009-03-18 19:29:40) [GET]
	  Parameters: {"action"=>"show", "id"=>"8699-radisson-hotel-waterfront-cape-town", "controller"=>"hotels"}
	  Hotel Load Scrooged (0.3ms)   SELECT `hotels`.narrative,`hotels`.from_price,`hotels`.star_rating,`hotels`.hotel_name,`hotels`.id FROM `hotels` WHERE (`hotels`.`id` = 8699) 
	  Address Load Scrooged (0.2ms)   SELECT `addresses`.id FROM `addresses` WHERE (`addresses`.hotel_id = 8699) 
	Rendering template within layouts/application
	Rendering hotels/show
	  Image Load Scrooged (0.3ms)   SELECT `images`.url,`images`.id,`images`.height,`images`.width FROM `images` WHERE (`images`.hotel_id = 8699) LIMIT 1
	Rendered shared/_header (0.1ms)
	Rendered shared/_navigation (0.2ms)
	  Image Load Scrooged (0.3ms)   SELECT `images`.thumbnail_width,`images`.id,`images`.thumbnail_height,`images`.thumbnail_url FROM `images` WHERE (`images`.hotel_id = 8699) 
	Rendered hotels/_show_sidebar (1.0ms)
	Rendered shared/_footer (0.1ms)
	Completed in 8ms (View: 5, DB: 1) | 200 OK [http://localhost/hotels/8699-radisson-hotel-waterfront-cape-town]
	

Suggested Use

Install, and you’re off to the races!

Installation

As a Rails plugin ( Recommended )

./script/plugin install git://github.com/methodmissing/scrooge.git

From Git

git pull git://github.com/methodmissing/scrooge.git

As a Gem

sudo gem install methodmissing-scrooge -s http://gems.github.com

Stability

The whole Rails 2.3.2 ActiveRecord test suite passes with scrooge, except for 13 failures related to callsite augmentation (note the SQL reload snippets below). Thoughts on handling or circumventing this much appreciated.


  2) Failure:
test_finding_with_includes_on_belongs_to_association_with_same_include_includes_only_once(EagerAssociationTest)
    [/opt/local/lib/ruby/gems/1.8/gems/activerecord-2.3.1/test/cases/../../lib/active_record/test_case.rb:31:in `assert_queries'
     /opt/local/lib/ruby/gems/1.8/gems/activerecord-2.3.1/test/cases/associations/eager_test.rb:139:in `test_finding_with_includes_on_belongs_to_association_with_same_include_includes_only_once'
     /opt/local/lib/ruby/gems/1.8/gems/activesupport-2.3.1/lib/active_support/testing/setup_and_teardown.rb:57:in `__send__'
     /opt/local/lib/ruby/gems/1.8/gems/activesupport-2.3.1/lib/active_support/testing/setup_and_teardown.rb:57:in `run']:
5 instead of 3 queries were executed.
Queries:
SELECT `posts`.id,`posts`.type FROM `posts` WHERE (`posts`.`id` = 1) 
SELECT `posts`.author_id,`posts`.title,`posts`.body,`posts`.comments_count,`posts`.taggings_count FROM `posts` WHERE (`posts`.`id` = 1) 
SELECT `authors`.name,`authors`.id FROM `authors` WHERE (`authors`.`id` = 1) 
SELECT `authors`.author_address_id,`authors`.author_address_extra_id FROM `authors` WHERE (`authors`.`id` = 1) 
SELECT `author_addresses`.id FROM `author_addresses` WHERE (`author_addresses`.`id` = 1) .
<3> expected but was
<5>.

To run tests in your environment :

  • Configure to run the ActiveRecord test suite as per the docs
  • ‘rake test’ from within the scrooge root directory
  • It’ll attempt to find the path to the ActiveRecord test cases through rubygems
  • Known to work with both 2.2.2, 2.3.0 and the upcoming 2.3.1

Initial Benchmarks

Passenger, Rails 2.2.2, remote DB :


Without scrooge:

Concurrency Level:      1
Time taken for tests:   68.279156 seconds
Complete requests:      150
Failed requests:        0
Write errors:           0
Total transferred:      13741201 bytes
HTML transferred:       13679100 bytes
Requests per second:    2.20 [#/sec] (mean)
Time per request:       455.194 [ms] (mean)
Time per request:       455.194 [ms] (mean, across all concurrent requests)
Transfer rate:          196.53 [Kbytes/sec] received

With scrooge:

Concurrency Level:      1
Time taken for tests:   58.162039 seconds
Complete requests:      150
Failed requests:        0
Write errors:           0
Total transferred:      13747200 bytes
HTML transferred:       13685100 bytes
Requests per second:    2.58 [#/sec] (mean)
Time per request:       387.747 [ms] (mean)
Time per request:       387.747 [ms] (mean, across all concurrent requests)
Transfer rate:          230.82 [Kbytes/sec] received

How it works

Callsites

Ruby allows introspection of the call tree through


  Kernel#caller
	

Scrooge analyzes the last 10 calltree elements that triggered


  ActiveRecord::Base.find_by_sql
	

Lets refer to that as a callsite, or signature.

Thus given SQL such as


"SELECT * FROM `images` WHERE (`images`.hotel_id = 11697)  LIMIT 1"
	

Called from our application helper


["/Users/lourens/projects/superbreak_app/vendor/plugins/scrooge/rails/../lib/scrooge.rb:27:in `find_by_sql'", "/Users/lourens/projects/superbreak_app/vendor/rails/activerecord/lib/active_record/base.rb:1557:in `find_every'", "/Users/lourens/projects/superbreak_app/vendor/rails/activerecord/lib/active_record/base.rb:1514:in `find_initial'", "/Users/lourens/projects/superbreak_app/vendor/rails/activerecord/lib/active_record/base.rb:613:in `find'", "/Users/lourens/projects/superbreak_app/vendor/rails/activerecord/lib/active_record/associations/association_collection.rb:60:in `find'", "/Users/lourens/projects/superbreak_app/vendor/rails/activerecord/lib/active_record/associations/association_collection.rb:67:in `first'", "/Users/lourens/projects/superbreak_app/app/helpers/application_helper.rb:60:in `hotel_image'", "/Users/lourens/projects/superbreak_app/app/views/hotels/_hotel.html.erb:4:in `_run_erb_app47views47hotels47_hotel46html46erb_locals_hotel_hotel_counter_object'", "/Users/lourens/projects/superbreak_app/vendor/rails/actionpack/lib/action_view/renderable.rb:36:in `send'", "/Users/lourens/projects/superbreak_app/vendor/rails/actionpack/lib/action_view/renderable.rb:36:in `render'", "/Users/lourens/projects/superbreak_app/vendor/rails/actionpack/lib/action_view/renderable_partial.rb:20:in `render'"]
	

We can generate a unique callsite identifier with the following calculation :


(The above calltree << "SELECT * FROM `images` ).hash " # cut off conditions etc.

Callsites are tracked on a per model ( table name ) basis.

Scope

Only SQL statements that meet the following criteria is considered for column optimizations :

  • A SELECT statement
  • Not a JOIN
  • The Model has a primary key defined

Only associations that meet the following criteria is associated with a callsite and preloaded
on subsequent requests :

  • Not a polymorphic association
  • Not a collection ( has_many etc. )

How it tracks

The ActiveRecord attributes Hash is replaced with a proxy that automatically augments the callsite with any attributes referenced through the Hash lookup keys. We’re also able to learn which associations is invoked from a given callsite, for preloading on subsequent requests.

Storage

There’s a slight memory hit for each model as the callsites is stored as a class level Hash, which is relatively lightweight and looks like this :


#<Scrooge::Callsite:0x3969968 @primary_key="id", @columns=#<Set: {"narrative", "from_price", "star_rating", "hotel_name", "id"}>, @associations=#<Set: {:address}>, @signature=-736202783, @klass=Hotel(id: integer, location_id: integer, hotel_name: string, location: string, from_price: float, star_rating: integer, apt: boolean, latitude: float, longitude: float, distance: float, narrative: text, telephone: string, important_notes: text, nearest_tube: string, nearest_rail: string, created_at: datetime, updated_at: datetime), @inheritance_column="type">

Tracking and scoping ?

The tracking and scoping phases is superseded by this implementation – none of those hindrances anymore.

Todo

  • Deeper coverage for Scrooge::AttributesProxy, possible handling of replace
  • More test cases for Scrooge::Callsite
  • Have invoking Model#attributes not associate all columns with the callsite
  • Avoid possible missing attribute exceptions for destroyed objects
  • Track rows of result set to allow more targeted loading of associations for a callsite

© 2009 Lourens Naudé (methodmissing) and Stephen Sykes (sdsykes)