stack_analyser

A python Project for course Business Intelligent aims to analyse the questions in stackoverflow, which includes two main parts, a spider based on scrpy and a web application based on flask.

Introduction

The project aims to find out what are the mostly asked questions every day and how they change with time in the famous Q&A site stackoverflow. Crawling all the questions is a quite simple job, as stackoverflow has no anti-spider policy, so it's easy to get millions of questions with scrapy. However, it's not so easy to store so much data. As the data is in json format, so I choose mongodb. As for showing the analysis result, I use flask to build a simple web application, in which I make use of highchats to show the result in different graph and charts.

The following diagram shows structure of the whole project.

Screenshot

hottest topics

how much share they account for

how the change with time

custom analysis

Setup

clone the project.
install mongodb(skip if exists) and create database stack_db in mongodb.
install mysql(skip if exists), create database stack_db, create table tag(id, tagname, tag_count, date) in stack_db.
open the project with PyCharm, both stack_analyser and stack_spider.
run the stack_spider to crawl questions from stackoverflow.
run static_cache.py in stack_analyser to do some statistis and data transfer.
run the stack_analyser for final result.

TangJiong/stack_analyser

stack_analyser

Introduction

Screenshot

Setup