/redback

A spider for Ruby

Primary LanguageRubyMIT LicenseMIT

redback

Redback is a Ruby spider (geddit?). Pass it a website, and it will begin its many-legged crawl, scurrying across the site to pull out all the unique URLs it can find.

Just like a terrifying real-life spider, redback aims to be fast: in particular, it sends requests in parallel so one slow page won't slow down your crawl.

Installation

$ gem install redback

Usage

Command line

$ redback http://example.com/

…in which case it will print all the URLs it finds within the site http://example.com/.

You can output he results to a file like this:

$ redback http://example.com > output.txt

Or feed them to another command line tool like this:

$ redback http://xkcd.com | grep xml

Within Ruby

It can also be used as a library:

require 'redback'

Redback.new "http://example.com" { |url| puts url }

The Redback.new method accepts a URL and a block; the block will be executed for each URL found.