Goal is to extract a list of Van Gogh paintings from the attached Google search results page.
Uses Nokogiri to parse the HTML after reading in the HTML file
CGI to unescape necessary properties
OpenURI to open the remote html file
JSON to write the resulting hash to the .json
file
How to run the program:
ruby carousel_scraper.rb
Carousel Scraper
in carousel_scraper.rb
contains all the logic that reads the Google search HTML result.
#parse
parses the HTML and adds all the paintings data into the resulting artworks
hash that can be seen in result.json
#write_to_file
writes the JSON result of the artworks hash that contains all the artwork data to result.json
In /spec
run bundle exec rspec carousel_scraper_spec.rb
to test carousel_scraper.rb
Result:
![Screen Shot 2023-07-21 at 7 44 09 PM](https://private-user-images.githubusercontent.com/6677487/255299954-160b35e0-23e6-4d99-887a-24a711e4c4ae.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIzNjc4OTMsIm5iZiI6MTcyMjM2NzU5MywicGF0aCI6Ii82Njc3NDg3LzI1NTI5OTk1NC0xNjBiMzVlMC0yM2U2LTRkOTktODg3YS0yNGE3MTFlNGM0YWUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDczMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MzBUMTkyNjMzWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YWI1Njc0OGQyNzU4MTJjMzU3MDZkMjQ5MDhiYzAzMDdhOGNlOWNlMTAzZDRlYzVjNDkxNTI0YzUzMWQ0MDNlOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.GVhAsOso0XUCcs3bf55QoreuKF2GC9PJ5Co7DEVw438)