/mtb-scrape

Scrape and organize foromtb.com data in an accessible way

Primary LanguageRuby

mtb-scrape is a project to scrape second-hand bicycle marketplace foromtb.com, extract data, and organize it to be easily searchable and filterable. More details here.

Development

To build the project for development:

Install dependencies with bower and bundler:

bundle install
bower install

then copy all assets to the /public directory

rake copy_assets

and run migrations (this is a custom rake task, not Rails rake db:migrate)

rake migrate

This can all be done in one step with

rake build

Running the code

Serve the Sinatra web app with

ruby app.rb

There is also a thor CLI provided with the following commands:

$ thor list
mtb_cli
-------
thor mtb_cli:add_brand NAME    # Add new brand to the DB
thor mtb_cli:parse_posts       # Build or update bikes from all new or updated posts
thor mtb_cli:reparse           # Rebuild bikes from their post data
thor mtb_cli:scrape NUM_PAGES  # Scrape the n first pages from foromtb
thor mtb_cli:update            # Scrape last 5 pages of foromtb and update bike infor...
thor mtb:scrape

Data

The application recognizes and creates bikes based on lists of brands and models from the database. You can get this data from sql dumps hosted on . To import the data into a postgres database:

curl http://mlovic.com/mtb-scrape/brands.sql > brands.sql
psql databasename < brands.sql
psql databasename < models.sql

Contributing

Contributions are very welcome.

For specs that require the database/ActiveRecord, require 'spec_helper' and add the loads_DB: true RSpec metadata to the example group. If the spec can run without ActiveRecord, leave these out for faster tests!

TODO

  • Allow users to save searches and receive notifications for new bikes

  • Implement submodels:

  • Abandon ActiveRecord pattern to further isolate the project's parts

  • Expand scope to also track bike parts

  • scheduler

  • run tests

  • where to download brand and model data