-
Use the URL prefix of the website as the spider name, table name, and web name.
-
Modify pipelines.py to create a table for the corresponding blog website.
-
Modify process_response in middlewares.py to process the dynamically loaded website
-
Write your own crawler in the spider folder
-
Use "xpath" helper extension in your browser to help you quickly position
-
scrapy crawl cybersecurity_att -s LOG_FILE=all.log
-
scrapy crawl carnal0wnage -s LOG_FILE=all.log
-
scrapy crawl insights_sei -s LOG_FILE=all.log
-
scrapy crawl coresecurity_blog -s LOG_FILE=all.log
-
scrapy crawl symantec-enterprise-blogs -s LOG_FILE=all.log