Webcrawler Stackoverflow

Python script to captures data from questions of stackoverflow and save the information in MySQL database.

Requirements

Python 3
PyMySQL
Unidecode
Requests

Install

$ brew install python3

$ pip3 install bs4

$ pip3 install pymysql

$ pip3 install unidecode

$ pip3 install requests

Create your database and insert the table

CREATE TABLE `questions` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(355) DEFAULT NULL,
  `link` varchar(500) DEFAULT NULL,
  `answers` int(8) DEFAULT NULL,
  `views` int(9) DEFAULT NULL,
  `answered_accepted` tinyint(1) DEFAULT NULL,
  `description` text,
  `stackoverflow_id` int(12) DEFAULT NULL,
  `creation_time` datetime DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `stackoverflow_id` (`stackoverflow_id`)
) ENGINE=MyISAM AUTO_INCREMENT=446564 DEFAULT CHARSET=utf8;

Please change the strings inside file [/core/dbconnection.py] to connect database

self.db = pymysql.connect("localhost", "username", "password", "database")

imperd1x/webcrawler-stackoverflow

Webcrawler Stackoverflow

Requirements

Install