
All kinds of Crawls

Primary LanguagePython


Collecting baiduindex of particular time and of particular person


  • BaiduIndex.py
    Main code
  • SQLTools.py
    Access database
  • ReadXml.py
    Tool to read xml

Operation Environment

  • selenium
  • MySQLdb
  • pytesseract

Data Structure(MySQL)

CREATE TABLE `baidu_index` (
  `input_id` int(11) NOT NULL AUTO_INCREMENT,
  `status` int(11) NOT NULL,
  `keyword` varchar(50) DEFAULT NULL,
  `time` varchar(45) CHARACTER SET latin1 DEFAULT NULL,
  `index` longtext,
  PRIMARY KEY (`input_id`)
input_id status keyword time index
1 0 GitHub 2016-03-01 ....

Operation Instruction

Prepare some data in the database then
python BaiduIndex.py


Take “战狼2" as an example,we get one piece of data like this
The program will request baiduindex.com,then login according to your variable AccountList in BaiduIndex.py
Then it will collect the baiduindex from 2017-11-12 to 2018-1-12
The result is like this
And the result will be save to your local database.
After doing all this,the status of input_id=1 will be set 1(The default value is 0)
If the keyword doesn't have any baiduindex, the status will be set -1

Know More Detail

To know more detail of this code you can visit my CSDN blog 基于Selenium与图像识别的百度指数爬虫 or download it.