istresearch/scrapy-cluster
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
PythonMIT
Issues
- 4
runspider: error: Unable to load 'link_spider.py': attempted relative import with no known parent package
#268 opened by BeamoINT - 1
ERROR: Unable to connect to Kafka in Pipeline due to attempt to connect already-connected SSLSocket!, raising exit flag.
#269 opened by BeamoINT - 2
Is this project no longer supported?
#266 opened by flyingdev - 0
ui exception No connection adapters were found
#260 opened by new-wxw - 1
Crawler unhandled exceptions not logged
#259 opened by getorca - 7
scrapy-cluster don't scroll the entire pages
#258 opened by RochdiBoudokhane - 5
Crawl complete signal
#193 opened by wysiwygism - 5
- 4
- 0
Incremented fail stats
#254 opened by mingxuan1 - 2
Upgrading the ELK stack
#253 opened by 4OH4 - 2
Scrape delay
#250 opened by tluyben - 2
TypeError: can't pickle thread.lock objects
#252 opened by benjaminelkrieff - 9
Future of the project
#235 opened by demisx - 5
maxdepth can not large than 2
#241 opened by anthony9981 - 6
- 2
Update Releases, Docker compose Issues
#238 opened by MrMoon - 0
- 6
Docker ERROR: Could not ping Zookeeper
#220 opened by Shique - 2
- 4
A mistake in parameter deliver through two method!
#225 opened by kevin-ZZZ - 2
supervisord install error while vagrant up
#226 opened by fatemeeeeeee - 2
Why encoding byted body
#229 opened by YanzhongSu - 2
online integration test error
#224 opened by rawiafray - 1
Can't up cluster from master branch
#227 opened by MasterSergius - 6
`Demo.incoming` topic keeps piling up
#221 opened by YanzhongSu - 2
Cannot turn off DEBUG level log
#223 opened by YanzhongSu - 1
- 3
- 3
Replace Kafka with another pub/sub service?
#215 opened by jrmlhermitte - 2
- 3
Lost config from Zookeeper makes spider down
#203 opened by jamesliu668 - 3
cluster mode online test hangs
#213 opened by danmsf - 1
- 1
How does it work with CrawlSpider?
#212 opened by kevin-ZZZ - 3
- 1
Does it support python3?
#211 opened by xingzhicn - 6
- 2
Rest - signal only works in main thread
#204 opened by Shique - 2
Kafka-Monitor stats can potentially accumulate indefinitely if Redis restarts/fails.
#186 opened by devspyrosv - 2
Add login in LinkSpider
#188 opened by josselinlbe - 2
- 1
Rebuilding crawler container fails
#197 opened by mrvnklm - 1
Missing KAFKA_PRODUCER_TOPIC in docker rest settings
#194 opened by aafeher - 2
- 1
Docker vs Vagrant
#192 opened by mshahriarinia - 2
Help processing pages returned by crawler
#187 opened by hellsingnorevy - 5
Support for custom header and cookies for the initial request from kafka_monitor.py feed
#182 opened by knirbhay - 2
Typo on document of Kafka-monitor API section
#183 opened by chihkaiyu - 1
Inter-spider communication
#178 opened by shenbakeshkishore