Create your very own web scraper and crawler using Go and Colly!
📂 makescraper
├── README.md
└── scrape.go
-
Visit github.com/new and create a new repository named
makescraper
. -
Run each command line-by-line in your terminal to set up the project:
$ git clone git@github.com:Make-School-Labs/makescraper.git $ cd makescraper $ git remote rm origin $ git remote add origin git@github.com:HexSeal/makescraper.git $ go mod download
-
Open
README.md
in your editor and replace all instances ofHexSeal
with your GitHub username to enable the Go Report Card badge.
Complete each task in the order they appear. Use GitHub Task List syntax to update the task list.
- IMPORTANT: Complete the Web Scraper Workflow worksheet distributed in class.
- Create a
struct
to store your data. - Refactor the
c.OnHTML
callback on line16
to use the selector(s) you tested while completing the worksheet. - Print the data you scraped to
stdout
.
- Add more fields to your
struct
. Extract multiple data points from the website. Print them tostdout
in a readable format.
- Serialize the
struct
you created to JSON. Print the JSON tostdout
to validate it. - Write scraped data to a file named
output.json
. - Add, commit, and push to GitHub.
- TBA 02/10!
- BEW 2.5 - Scraping the Web: Concepts and examples covered in class related to web scraping and crawling.
- Colly - Docs: Check out the sidebar for 20+ examples!
- Ali Shalabi - Syntax-Helper: Command line interface to help generate proper code syntax, pulled from the Golang documentation.
- JSON to Struct: Paste any JSON data and convert it into a Go structure that will support storing that data.
- GoByExample - JSON: Covers Go's built-in support for JSON encoding and decoding to and from built-in and custom data types (structs).
- GoByExample - Writing Files: Covers creating new files and writing to them.