#XYWYCrawler, crawler in action!
Description: This application is used to collect data from a website ( question list by day ) which records is more more than 100 million , so it necessary to take some strategies to ensure that all the data can been crawled in an accepted time. The strategies taken are as following:
- Multithreading
- Multiprocessing
- Redis as the task queue
- RPC to share the message source
- DBHelper to keep a connections pool
- Message consumer running 4 machines
Welcome to contact me @ hit_oak_tree@126.com to discuss this question together.