/scraper_rb

Ruby package for Prompt API's Scraper API

Primary LanguageRubyMIT LicenseMIT

Ruby Gem Version Build Status

Prompt API - Scraper API - Ruby Package

scraper_rb is a simple python wrapper for scraper-api.

Requirements

  1. You need to signup for Prompt API
  2. You need to subscribe scraper-api, test drive is free!!!
  3. You need to set PROMPTAPI_TOKEN environment variable after subscription.

then;

$ gem install scraper_rb

or; install from GitHub:

$ gem install scraper_rb --version "0.1.2" --source "https://rubygems.pkg.github.com/promptapi"

Example Usage

Basic scraper:

require "scraper_rb"

s = ScraperRb.new('https://pypi.org/classifiers/') # no params
s.get
s.response
# {
#     :headers=>{:"Content-Length"=>...}, 
#     :url=>"https://pypi.org/classifiers/",
#     :data=>"<!DOCTYPE html>\n<html> ...",
# }

s.response[:headers]     # => return response headers
s.response[:data]        # => return scraped html
s.save('/tmp/data.html') # => {:file=>"/tmp/data.html", :size=>321322}

# or

save_result = s.save('/tmp/data.html')
puts save_result[:error] if save_result.key?(:error) # we have a file error

You can add url parameters for extra operations. Valid parameters are:

  • auth_password: for HTTP Realm auth password
  • auth_username: for HTTP Realm auth username
  • cookie: URL Encoded cookie header.
  • country: 2 character country code. If you wish to scrape from an IP address of a specific country.
  • referer: HTTP referer header
  • selector: CSS style selector path such as a.btn div li. If selector is enabled, returning result will be collection of data and saved file will be in .json format.

Here is an example with using url parameters and selector:

require "scraper_rb"

params = {country: 'EE', selector: 'ul li button[data-clipboard-text]'}
s = ScraperRb.new('https://pypi.org/classifiers/', params)
s.get
s.response[:headers]       # => return response headers
s.response[:data]          # => return an array, collection of given selector
s.response[:data].length   # => 734 
s.save('/tmp/test.json')   # => {:file=>"/tmp/test.json", :size=>174449}

# or

save_result = s.save('/tmp/test.json')
puts save_result[:error] if save_result.key?(:error) # we have a file error

Default timeout value is set to 10 seconds. You can change this while initializing the instance:

s = ScraperRb.new('https://pypi.org/classifiers/', params={}, timeout=50) 
# => 50 seconds timeout w/o params

s = ScraperRb.new('https://pypi.org/classifiers/', params={country: 'EE'}, timeout=50) 
# => 50 seconds timeout

You can add extra X- headers:

s = ScraperRb.new('https://pypi.org/classifiers/', headers={'X-Referer': 'https://www.google.com'}) 

# or
s = ScraperRb.new('https://pypi.org/classifiers/', params={country: 'EE'}, headers={'X-Referer': 'https://www.google.com'}, timeout=50) 
# => 50 seconds timeout

headers param is a Hash, you can add key/value data. Header keys must star with X- prefix. More detail can found at Mozilla site.


Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org

$ rake -T

rake build            # Build bin_checker_rb-X.X.X.gem into the pkg directory
rake clean            # Remove any temporary products
rake clobber          # Remove any generated files
rake install          # Build and install bin_checker_rb-X.X.X.gem into system gems
rake install:local    # Build and install bin_checker_rb-X.X.X.gem into system gems without network access
rake release[remote]  # Create tag v0.0.0 and build and push bin_checker_rb-X.X.X.gem to rubygems.org
rake test             # Run tests
  • If you have PROMPTAPI_TOKEN you’ll have real http request based tests available.
  • Set RUBY_DEVELOPMENT to 1 for more verbose test results

License

This project is licensed under MIT


Contributer(s)


Contribute

Bug reports and pull requests are welcome on GitHub:

  1. fork (https://github.com/promptapi/scraper_rb/fork)
  2. Create your branch (git checkout -b my-feature)
  3. commit yours (git commit -am 'Add awesome features...')
  4. push your branch (git push origin my-feature)
  5. Than create a new Pull Request!

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.