A simple, yet powerful API to get the contents of a webpage (links, h1, h2, h3)
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes
What things you need to install the software and how to install them
Ruby 2.4.1 installed locally, or change the ruby line in Gemfile to your local version
In Gemfile
ruby '2.4.1'
Change it to:
ruby 'your_local_version'
Clone this repository
navigate to the cloned repo in your terminal
rake db:create
rake db:migrate
rails s
Now, if everything went perfectly you should be able to communicate with the server on port 3000
There are two functions for this API
Parse and save the contents of a specific url, eg. 'https://www.telemonetize.com'
curl -i -H "Accept: application/vnd.api+json" -H 'Content-Type:application/vnd.api+json' -X POST -d '{"data": {"type":"webpages", "attributes":{"url":"https://www.telemonetize.com”}}}’ http://localhost:3000/webpages
curl -i -H "Accept: application/vnd.api+json" "http://localhost:3000/webpages?include=contents&fields%5Bcontents”
Or in ruby
require 'net/http'
require 'uri'
uri = URI.parse("http://localhost:3000/webpages?include=contents")
request = Net::HTTP::Get.new(uri)
request["Accept"] = "application/vnd.api+json"
req_options = {
use_ssl: uri.scheme == "https",
}
response = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
http.request(request)
end
# response.code
# response.body
The JSON response will look something like this
require 'net/http'
require 'uri'
require 'json'
uri = URI.parse("http://localhost:3000/webpages?include=contents")
request = Net::HTTP::Get.new(uri)
request["Accept"] = "application/vnd.api+json"
req_options = {
use_ssl: uri.scheme == "https",
}
response = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
http.request(request)
end
puts response.code
json = JSON.parse(response.body)
json['data'].each do |data|
puts "\n\n\n\n\n"+data['attributes']['url']
# Get the content IDS
content_ids = []
data['relationships']['contents']['data'].each do |content|
content_ids.push(content['id'])
end
# Print the contents
json['included'].each do |content|
if content_ids.include? content['id']
puts '--> ' + content['attributes']['wp-content-type'].to_s + ': ' + content['attributes']['data'].to_s
end
end
end
- Ruby On Rails - The web framework used
- jsonapi-resources - JSON API library
- Nokogiri - Used for web scraping
- Alexander Sideris - Initial work - (https://github.com/alexandersideris)
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details