go get github.com/WithGJR/search-result-crawler
First, you need to initialize a Crawler
struct. Crawler
struct has two fields, one is Input
, and another one is Parser
.
Parser
is an interface, the structure of this interface is:
type Parser interface {
GetSearchResultPageURL(string, int) string // (keyword string, page int) -> (string)
Parse(*goquery.Document, string, int, chan IntermediatePair) // (doc *goquery.Document, keyword string, page int, channel chan intermediatePair)
}
So, you need to implement this interface by yourself.
After initialization, you just need to call the Crawler
's method Start()
, this method will start the crawling tasks and return the parsing result
.
The type of result
is map[string][][]Result
. The key of this map is the keyword you specified, and the value of this map is a slice of search result pages of a specified keyword, and the elements of this slice is another slice of Result
.
The structure of Result
is:
type Result struct {
Title string
URL string
Description string
}
Following is a simple example showing how to use this project:
c := crawler.Crawler{
Input: map[string][]int{
"golang tutorial": []int{1, 2, 3},
"seo": []int{1, 2, 3, 5},
},
Parser: &parser.GoogleSearchParser{},
}
r := c.Start()
// If you call r["seo"][0], you will get a slice of the first page of the search result of the keyword 'seo'.
I wrote a Chinese tutorial and posted it on my blog: Golang - 如何寫爬蟲抓搜尋引擎的搜尋結果?.