/webcrawler-stackoverflow

webcrawler to get all questions from website stackoverflow

Primary LanguagePythonMIT LicenseMIT

Software License

Webcrawler Stackoverflow

Python script to captures data from questions of stackoverflow and save the information in MySQL database.

alt text

alt text

alt text

Requirements

Install

$ brew install python3
$ pip3 install bs4
$ pip3 install pymysql
$ pip3 install unidecode
$ pip3 install requests

Create your database and insert the table

CREATE TABLE `questions` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(355) DEFAULT NULL,
  `link` varchar(500) DEFAULT NULL,
  `answers` int(8) DEFAULT NULL,
  `views` int(9) DEFAULT NULL,
  `answered_accepted` tinyint(1) DEFAULT NULL,
  `description` text,
  `stackoverflow_id` int(12) DEFAULT NULL,
  `creation_time` datetime DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `stackoverflow_id` (`stackoverflow_id`)
) ENGINE=MyISAM AUTO_INCREMENT=446564 DEFAULT CHARSET=utf8;

Please change the strings inside file [/core/dbconnection.py] to connect database

self.db = pymysql.connect("localhost", "username", "password", "database")