/crawler

crawler by golang

Primary LanguageHTML

crawler

  • support crawling user data from the given website.

single-crawler

seed --(requests)--------------> engine
                                 |
                                 V
fetcher(fetch the page) <--- task queue(requests) for len(task queue) <=0 {quit}
  |                              ^
  V                              |
parser--(new requests)-----------^

concurrent-crawler

engine <----(requests) <-------seeds
 |
 V
scheduler ---> requestqueue(request chan) --->activerequester
 |                ^                               |
 |                |--------worker(queue)<-----    | 
 |                                            ^   |
 V                                            |   V
 workerqueue(worker chan) --------------- activeworker

 ---(requests)-------------engine--------->data(items)
 |                           ^
 V                           |(requests, items)
 scheduler---(requests)-> worker(queue)

截图