Live Scraper is a service that receives a request for an Amazon ID, Scrapes Amazon's website for information based on that ID, and responds back with a JSON Representation of that Resource, or an error signifying that it failed to find any data on that Resource.
go get ./..
cd
to directorygo build
./live-scraper
- Make request to
localhost:8080/movie/amazon/{insert_amazon_id_here}
At the current moment, the version of the website (eg: .com, .de) is set manually. It can be changed on line 32 of main.go
Initially, I wanted to simply use the golang.org/x/net/html
package to parse the HTML nodes found on the Amazon page. While this was possible to an extent without doing too much work, I found myself trying to figure out how to check CSS selectors for certain pieces of data, which ultimately would have lead me down the path of rewriting a lot of the work found in the goquery
package by PuerkitoBio on GitHub: goquery. It seemed like the most practical approach to solve this problem, and was fairly speedy.
-
Request comes in
-
Parse request
-
Make own request to amazon.com/whatever
-
Get HTML reponse back from Amzon
-
Parse it
-
Get correct elements out. if missing/broken info, bail with an error message to the user
-
Make a struct
-
Marshall data
-
Return JSON to user
-
Talk to user (main.go)
-
Talk to amazon (parse.go)
-
handling HTML (parse.go)
-
prepping data for user (marshal.go/data.go)