XYWYCrawler: A Python repository from liuming-dev

#XYWYCrawler, crawler in action!

Description: This application is used to collect data from a website ( question list by day ) which records is more more than 100 million , so it necessary to take some strategies to ensure that all the data can been crawled in an accepted time. The strategies taken are as following:

Strategies

Multithreading
Multiprocessing
Redis as the task queue
RPC to share the message source
DBHelper to keep a connections pool
Message consumer running 4 machines

FAQ

Welcome to contact me @ hit_oak_tree@126.com to discuss this question together.

liuming-dev/XYWYCrawler

Strategies

FAQ