MyRespect/BlogCrawler

Security-related Blog Crawler

Python

Cyber-Physical Threat Intelligence (CTI) Blog Crawler

Developer

Use the URL prefix of the website as the spider name, table name, and web name.
Modify pipelines.py to create a table for the corresponding blog website.
Modify process_response in middlewares.py to process the dynamically loaded website
Write your own crawler in the spider folder
Use "xpath" helper extension in your browser to help you quickly position

Usage

scrapy crawl cybersecurity_att -s LOG_FILE=all.log
scrapy crawl carnal0wnage -s LOG_FILE=all.log
scrapy crawl insights_sei -s LOG_FILE=all.log
scrapy crawl coresecurity_blog -s LOG_FILE=all.log
scrapy crawl symantec-enterprise-blogs -s LOG_FILE=all.log