This project implements a simple web crawler written in Go. It crawls a starting URL and fetches its content and links recursively up to a specified depth.
- Go 1.21.5 or later (download Go)
go build
./webcrawler <url> <depth>
- Replace
<url>
with the starting URL you want to crawl. - Replace
<depth>
with the maximum depth of crawling (number of levels to follow links).
./webcrawler <http://golang.org/> 2
This command will crawl the Go website (http://golang.org/) up to a depth of 2. The output will display the fetched content and links for each visited URL.
-
The code uses a
fakeFetcher
for demonstration purposes. A real web crawler would need to implement a function that fetches actual web content. -
The
Crawl
function utilizes goroutines to fetch URLs concurrently. -
A
blockingChannel
andWaitGroup
are used to synchronize access to shared resources (visited URLs) and ensure all goroutines finish before exiting.
- Implement a real web fetching function using packages like http.
- Add error handling for network issues.
- Improve concurrency management.
- Persist crawled data.