You need clone application :
$ cd workspace #directory where will be current project
$ git clone https://git@github.com:valexl/crawler_dmoz.git
$ cd crawler_sketch
You need make bundle install before:
$ bundle install
You need load all industries via rake task:
$ rake data:load:industries
Test that everything is well:
$ rspec spec
Try it in irb:
$ irb -r ./boot.rb
Example how to use:
path = "#{Dir.pwd}/data/content.rdf.u8" # content.rdf.u8 - is file downloaded from mounthly backups - http://rdf.dmoz.org/
parser = DMOZParser.new path
parser.load!