
Scraping library based on capybara. Modules are used to provide DSLs specific to certain tasks, and websites.

     _        _
     /        \    /)     _____                             __                    
     \o      o/   ((     / ___/______________ _____  __  __/ /_  ____ __________ _
     /\      /\    ))    \__ \/ ___/ ___/ __ `/ __ \/ / / / __ \/ __ `/ ___/ __ `/
    /==\ () /==\  //    ___/ / /__/ /  / /_/ / /_/ / /_/ / /_/ / /_/ / /  / /_/ / 
   |    `UU`    |//    /____/\___/_/   \__,_/ .___/\__, /_.___/\__,_/_/   \__,_/  
   |            |/                         /_/    /____/
 .-'\          /'-.
(((` ) |----| ( `)))
    (((`    `)))


A modular web scraping framework based on capybara, capybara-webkit, and poltergeist


  • Capabara DSL and drivers
  • Modular plugins for scraping specific sites
  • Additional utility methods to simplify your scraping efforts



  • Ruby 1.9/2.0
  • libxml2
  • libxslt
  • Qt (*capybara-webkit)
  • PhantomJS (*poltergeist)

Using Bundler

The simplest way to install Scrapybara is to use Bundler.

Add Scrapybara to your Gemfile:

gem 'scrapybara'

Or install the gem manually:

gem install scrapybara



Note *You'll need to manually load it from irb until this is packaged as a gem. For example:

cd /path/to/scrapybara
irb -Ilib -rscraper

You can now access the libraries inside IRB:

scraper = Scraper::Edgar.new
#=> #<Scraper::Edgar:0x007fbf478f73a8 @app_host="http://www.sec.gov">

# import most recent filings

# import filings for given day

# query mongoid db using a named scope, see more: http://mongoid.org/en/mongoid/docs/querying.html
form_10ks = Scraper::Edgar::Filing.form_10k

# view documents for given filing:
most_recent_10k = Scraper::Edgar::Filing.form_10k.last

#=> []


Generate it:

  yardoc 'lib/*.rb' 'lib/**/*.rb' 'lib/**/**/*.rb'

To do

  • better test coverage
  • more field validations of models
  • more plugins


A lot of new contributors ask "Well, where do I start?". Below are some links to comprehensive resources for newcomers to get up to speed and get dive right in to fixing bugs and adding features.

How to Contribute the Right Way

We try to stick to a set of guidelines when it comes to contributing code. When you're writing a bugfix or custom code from scratch, it's good practice to ask yourself:

Other helpful resources

Below are some relevant links to other parts of the wiki. We're currently restructuring everything, so the below links may be subject to change.

Thank you Diaspora project for the basic ideas on how to structure the README and wikis

Ruby Interpreter Compatibility

Scrapybara has been tested on the following ruby interpreters:

  • MRI 1.9.3
  • MRI 2.0.0


  • Source hosted on GitHub.
  • Direct questions and discussions to the IRC channel
  • Report issues on GitHub Issues.
  • Pull requests are very welcome! Please include spec and/or feature coverage for every patch, and create a topic branch for every separate change you make.
  • See the Contributing guide for instructions on running the specs and features.
  • Documentation is generated with YARD (cheat sheet). To generate while developing:
yard server --reload