BaiduIndexCrawl

Collecting baiduindex of particular time and of particular person

MainCode

BaiduIndex.py
Main code
SQLTools.py
Access database
ReadXml.py
Tool to read xml

Operation Environment

selenium
MySQLdb
pytesseract

Data Structure(MySQL)

CREATE TABLE `baidu_index` (
  `input_id` int(11) NOT NULL AUTO_INCREMENT,
  `status` int(11) NOT NULL,
  `keyword` varchar(50) DEFAULT NULL,
  `time` varchar(45) CHARACTER SET latin1 DEFAULT NULL,
  `index` longtext,
  PRIMARY KEY (`input_id`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8;

input_id	status	keyword	time	index
1	0	GitHub	2016-03-01	....

Operation Instruction

Prepare some data in the database then
python BaiduIndex.py

Sample

Take “战狼2" as an example,we get one piece of data like this
[1,战狼2,2017-12-12]
The program will request baiduindex.com，then login according to your variable AccountList in BaiduIndex.py
Then it will collect the baiduindex from 2017-11-12 to 2018-1-12
The result is like this
[2017-11-12:3930,2017-11-13:4040……]
And the result will be save to your local database.
After doing all this,the status of input_id=1 will be set 1(The default value is 0)
If the keyword doesn't have any baiduindex, the status will be set -1

Know More Detail

To know more detail of this code you can visit my CSDN blog 基于Selenium与图像识别的百度指数爬虫 or download it.