Gobot

A web crawler written in Go

Building and Running

Build gobot by running the go build command. Once built you can run:

gobot crawl -domain foo.com

to crawl the foo.com domain. The results are sent to stdout.

To run tests, navigate to the root folder and run:

go test ./...

The output from the tests will be sent to the terminal.

Gobot was built to do the following:

Crawl a single domain specified by the user, and do not follow links to subdomains
Maintain a collection of links (as identified by the 'href' attribute of the 'a' tag), and static assets (as identified by the 'link'script', and 'img' tags)
Output this information to stdout, with each crawled path and related links/static assets
Remove the hash fragment (i.e. everything after '#') when crawling pages
Consider query parameters (i.e. everything after '?') as identifying a unique URL. For example, /foo and /foo?bar=baz will both be crawled and considered unique URLs

Gobot has some remove from improvement. The following are a collection of features that would improve Gobot:

Ideas and thoughts are always welcome

MIT