Dean's Cool Web Crawler

A CLI tool for grabbing anchors from webpages.

Installation

go run cmd/main.go -uri https://www.google.com

This program takes one argument: a fuly-qualified URL for the webpage you wish to scrape.

go-crawler -uri <URL>

This will then output a file to your current working directory.

Anchors printed to <wd>/anchors-<timestamp>.txt! Thank you for using my cool tool!

You cna add the flag -outputtoconsole if you would prefer to have the URLs dumped into your current terminal session:

go-crawler -uri <URL> -outputtoconsole

Benchmark specification:

goos: linux
goarch: amd64
pkg: github.com/deanfoley/go-web-crawler/internal
cpu: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz

Benchmark command: go test --bench=. -benchmem -benchtime=10s -count=5 -run=^#

Test	Average Cycles	Average ns/op	Bytes/op	Allocs/op
GrabWebpage	58,840	204,057	91,433	77

Test	Average Cycles	Average ns/op	Bytes/op	Allocs/op
ExtractAnchors	134,922	12,168	6,552	25
FormatAnchors	626,671	2718	593	3

Test	Average Cycles	Average ns/op	Bytes/op	Allocs/op
ValidURL	725,694	1,834	144	1
InvaldURL	1,000,000	1,128	208	3

This project supports pprof!

Pass in a -cpuprofile and/or -memprofile flag with a desired output to output a prof file for either.

go run main.go -uri https://www.vortex.com -cpuprofile cpu.prof -memprofile mem.prof