/BookSpider

crawl book information through web.

Primary LanguagePython

Book Spider

Overview

This repo is created for crawling amazon and douban book's information by Python language.

Needs

BookHelper

In BookHelper class some basic methods are programed, like:

1. getAmazonAsinByIsbn
2. getAmazonIsbnByAsin
3. getAmazonAsinByTitleAndAuthor

Book

In Book class some basic methods are programed, like:

1. getAmazonBookInforByIsbn
2. getAmazonBookInforByAsin
3. getAmazonBookInforByTitleAndAuthor
4. getDoubanBookInforByIsbnOrSubjectId

Result

book

Notethat

  • '.UserAgentString.json'文件里面包含有9502PC浏览器代理信息和512Mobile浏览器代理信息。

  • 默认不启用CrawleraProxy服务, 需要自己设置 CRAWLERA_USER 的值。(具体如何设置CRAWLERA_USER,请参考官网)

  • I use key by local user by edit the .bash_profile in user home dir, like:

    #in .bash_profile
    export CRAWLERA_USER=<KEY>
    #out .bash_profile
    source .bash_profile
  • Get the CRAWLERA_USER by use os module in python language, like:
    import os
    userkey = os.environ.get("CRAWLERA_USER")
  • Now you know it. By the way, CRAWLERAKEY is not free!