/web-crawler

A single-threaded web crawler

Primary LanguageRuby

Web Crawler

A single-threaded web crawler that extracts static asset's urls (linked to the domain) from pages it visits.

Output displayed in the terminal in the following json format:

[
  {
    "url": "http://www.example.org",
    "assets": [
      "http://www.example.org/image.jpg",
      "http://www.example.org/script.js"
    ]
  },
  {
    "url": "http://www.example.org/about",
    "assets": [
      "http://www.example.org/company_photo.jpg",
      "http://www.example.org/script.js"
    ]
  }
]

Installation instructions

  1. Install bundler (skip step if already done)
gem install bundler
  1. Run Bundle install in root folder (web-crawler) to install all ruby dependencies
bundle install
  1. Run the program crawl.rb (Require version of ruby >= 2)
ruby crawl.rb [url]

Run tests

bundle exec rspec