
Distributed redis-based web crawler framework for colly

A distributed redis-based framework for colly.

Collyzar provides a very simple configuration and tools to implement distributed crawling/scraping.


  • Simple configuration and clean API
  • Distributed crawling/scraping
  • Built-in global bloom filter
  • Built-in spider cache
  • Support redis command
  • Multi-machine load balancing
  • Support to pause or stop all crawling machines
  • Pass additional information to the crawler and get it inside the crawler and store it in the database


Add collyzar to your go.mod file:

module github.com/x/y

go 1.14

require (
        github.com/Zartenc/collyzar/v2 latest

Example Usage

See examples folder for more detailed examples.

Crawler cluster machine

SpiderName must be unique.

After running, it will always monitor the redis crawler queue for crawling until it receives a pause or stop signal.

func main(){
    cs := &collyzar.CollyzarSettings{
    		SpiderName: "zarten",
    		Domain:     "www.amazon.com",
    		RedisIp:    "",
	collyzar.Run(myResponse, cs, nil)

func myResponse(response *collyzar.ZarResponse){

Control machine

Push url to redis queue

func main(){
	ts := collyzar.NewToolSpider("", 6379, "", "zarten")

	url := "https://www.amazon.com"
	pushInfo := collyzar.PushInfo{Url:url}

	err := ts.PushToQueue(pushInfo)
	if err != nil{


Provide tools including stop crawlers and pause crawlers.

Stop all crawlers
func main() {
	ts := collyzar.NewToolSpider("", 6379, "", "zarten")

	err := ts.StopSpiders()
	if err != nil{

Pause all crawlers

For all crawlers, the crawler process is idle after pausing the crawler.
Then you can use the WakeupSpiders method to wake up the crawlers.

func main() {
	ts := collyzar.NewToolSpider("", 6379, "", "zarten")

	err := ts.PauseSpiders()
	if err != nil{


Bugs or suggestions? Visit the issue tracker


If you wish to contribute to this project, please branch and issue a pull request against master ("GitHub Flow").