Live Scraper

Live Scraper is a service that receives a request for an Amazon ID, Scrapes Amazon's website for information based on that ID, and responds back with a JSON Representation of that Resource, or an error signifying that it failed to find any data on that Resource.

How to Build/Run:

go get ./..
cd to directory
go build
./live-scraper
Make request to localhost:8080/movie/amazon/{insert_amazon_id_here}

A Note on Requests to Different Locations of Amazon Site:

At the current moment, the version of the website (eg: .com, .de) is set manually. It can be changed on line 32 of main.go

On Package Selection:

Initially, I wanted to simply use the golang.org/x/net/html package to parse the HTML nodes found on the Amazon page. While this was possible to an extent without doing too much work, I found myself trying to figure out how to check CSS selectors for certain pieces of data, which ultimately would have lead me down the path of rewriting a lot of the work found in the goquery package by PuerkitoBio on GitHub: goquery. It seemed like the most practical approach to solve this problem, and was fairly speedy.

How it Works

Request comes in
Parse request
Make own request to amazon.com/whatever
Get HTML reponse back from Amzon
Parse it
Get correct elements out. if missing/broken info, bail with an error message to the user
Make a struct
Marshall data
Return JSON to user
Talk to user (main.go)
Talk to amazon (parse.go)
handling HTML (parse.go)
prepping data for user (marshal.go/data.go)

stuartweir/live-scraper

Live Scraper

How to Build/Run:

A Note on Requests to Different Locations of Amazon Site:

On Package Selection:

How it Works