This project is an experiment in working with large datasets of semi-formatted data made available by the OpenLibrary.org project.
The project experiments with the following technologies:
- Ruby on Rails (4.1.x) as an API server and to handle the underlying ETL (extract / transform / load) operations
- JSON API a specification for the formatting of JSON responses
- Ember.js (2.5.x) as a single page JavaScript framework
- D3.js (4.1.x) for visualizations
This project is available under the MIT license. The use of the OpenLibrary.org data files Open Library Data Dumps is subject to the Internet Archive's Terms of Use.
This is a Ruby on Rails project using PostgreSQL. At a high level the following steps are required:
- Set up your environment so that Ruby on Rails can run locally.
- Clone this repo into a project directory
- Install your gems (e.g.
bundle install
) - Set up a PostgreSQL database server. Create a user that has
CREATE
privileges if you want the rails project to build your database during setup (or, if in development, simply make your user asuperuser
). - Create a
config/database.yml
using the login setting from your database setup. There is an example config file namedconfig/database.yml.example
that can be used as a template. - Initialize your database:
rake db:create
rake db:migrate
Fire up the rails server (e.g. rails s -p 3030
), if everything is working correctly you should see an empty front-end at localhost:3030.
WARNING: This project is essentially just a front end tool for three large data files. The raw file size is approximately 45G and the processed import will generate approximately 280M rows. The full injest time (not counting file downloads) can take over 24 hours on a mid-range desktop computer.
Download the three individual data files–ol_dump_editions_latest.txt.gz
, ol_dump_editions_latest.txt.gz
and ol_dump_editions_latest.txt.gz
–from OpenLibrary.org and extract them into a directory (e.g. /path/to/dowloads/OpenLibrary/
). Run the following rake tasks:
rake etl:load_authors[/path/to/downloads/OpenLibrary]
rake etl:load_works[/path/to/downloads/OpenLibrary]
rake etl:load_editions[/path/to/downloads/OpenLibrary]
rake generate_tokens
- or a catch-all task:
rake load_all[/path/to/downloads/OpenLibrary]
- Animate chart rending
- Add Liquid Fire for route transitions
- Intelligent wrap of pie chart labels
- Sidebar visualization of terms
This appears to work after updating the Rails version to 4.2 and adding a pin on ember-inflector to 2.2.0 under node 10.x.