/simple-spiders

A simple web crawling framework.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

simple Spider

python -> 3.4+ coverage -> 37% build -> passing

     _              _       ___       _    _
 ___<_>._ _ _  ___ | | ___ / __> ___ <_> _| | ___  _ _
<_-<| || ' ' || . \| |/ ._>\__ \| . \| |/ . |/ ._>| '_>
/__/|_||_|_|_||  _/|_|\___.<___/|  _/|_|\___|\___.|_|
              |_|               |_|

中文

Overview

A simple web crawling framework.Document

Getting Started

pip install sspider

You should construst project.py to suit your needs

   >>> from sspider import Spider, Request
   >>> # 建立request对象
   >>> request = Request('get', 'https://movie.douban.com/subject/27202819/reviews')
   >>> # 建立爬虫对象
   >>> spider = Spider()
   >>> # 运行爬虫
   >>> spider.run(request)
   ...
   >>> # 保存爬取结果
   >>> spider.write('test.txt)

python project.py

Ctrl-C to stop

Referenced Document

Referenced Libraries

  • Using requests as htmlDownloader
  • Using lxml as default htmlParser
  • Using csv provide feature that export file as csv type
  • Using xlwt provide feature that export file as excel type
  • Using xlsxwriter provide feature that export file as xexcel type

Project structure


License

This project is published open source under license agreement. Please maintain the open source release after modification and sign the name of the original author. Thank you for your respect

If you need to apply this project for commercial purposes, please contact me( @pengr ) separately to obtain commercial authorization