simple Spider

     _              _       ___       _    _
 ___<_>._ _ _  ___ | | ___ / __> ___ <_> _| | ___  _ _
<_-<| || ' ' || . \| |/ ._>\__ \| . \| |/ . |/ ._>| '_>
/__/|_||_|_|_||  _/|_|\___.<___/|  _/|_|\___|\___.|_|
              |_|               |_|

中文

Overview

A simple web crawling framework.Document

Getting Started

pip install sspider

You should construst project.py to suit your needs

   >>> from sspider import Spider, Request
   >>> # 建立request对象
   >>> request = Request('get', 'https://movie.douban.com/subject/27202819/reviews')
   >>> # 建立爬虫对象
   >>> spider = Spider()
   >>> # 运行爬虫
   >>> spider.run(request)
   ...
   >>> # 保存爬取结果
   >>> spider.write('test.txt)

python project.py

Ctrl-C to stop

Referenced Document

Referenced Libraries

Using requests as htmlDownloader
Using lxml as default htmlParser
Using csv provide feature that export file as csv type
Using xlwt provide feature that export file as excel type
Using xlsxwriter provide feature that export file as xexcel type

Project structure

License

This project is published open source under agreement. Please maintain the open source release after modification and sign the name of the original author. Thank you for your respect

If you need to apply this project for commercial purposes, please contact me( @pengr ) separately to obtain commercial authorization