In Ubuntu, you need to install some libraries.
You can use pip or easy_install or apt-get to do this.
- lxml
- chardet
- splinter
- gevent
- phantomjs
- Use MSpider collect the vulnerability information on the wooyun.org.
python mspider.py -u "http://www.wooyun.org/bugs/" --focus-domain "wooyun.org" --filter-keyword "xxx" --focus-keyword "bugs" -t 15 --random-agent true
- Use MSpider collect the news information on the news.sina.com.cn.
python mspider.py -u "http://news.sina.com.cn/c/2015-12-20/doc-ifxmszek7395594.shtml" --focus-domain "news.sina.com.cn" -t 15 --random-agent true
- Crawl and storage of information.
- Distributed crawling.
__ __ _____ _ _
| \/ |/ ____| (_) | |
| \ / | (___ _ __ _ __| | ___ _ __
| |\/| |\___ \| '_ \| |/ _` |/ _ \ '__|
| | | |____) | |_) | | (_| | __/ |
|_| |_|_____/| .__/|_|\__,_|\___|_|
| |
Author: Manning23
-h, --help show this help message and exit
Target URL (e.g. "http://www.site.com/")
Max number of concurrent HTTP(s) requests (default 10)
Crawling depth
Crawling number
--time=MSPIDER_TIME Crawl time
HTTP Referer header value
HTTP Cookie header value
Crawling mode: Static_Spider: 0 Dynamic_Spider: 1
Mixed_Spider: 2
Crawling strategy: Breadth-first 0 Depth-first 1
Random-first 2
Focus keyword in URL
Filter keyword in URL
Filter domain
Focus domain
Use randomly selected HTTP User-Agent header value
Will show more information