stack_analyser
A python Project for course Business Intelligent aims to analyse the questions in stackoverflow, which includes two main parts, a spider based on scrpy and a web application based on flask.
Introduction
The project aims to find out what are the mostly asked questions every day and how they change with time in the famous Q&A site stackoverflow. Crawling all the questions is a quite simple job, as stackoverflow has no anti-spider policy, so it's easy to get millions of questions with scrapy. However, it's not so easy to store so much data. As the data is in json format, so I choose mongodb. As for showing the analysis result, I use flask to build a simple web application, in which I make use of highchats to show the result in different graph and charts.
The following diagram shows structure of the whole project.
Screenshot
how much share they account for
Setup
- clone the project.
- install mongodb(skip if exists) and create database stack_db in mongodb.
- install mysql(skip if exists), create database stack_db, create table tag(id, tagname, tag_count, date) in stack_db.
- open the project with PyCharm, both stack_analyser and stack_spider.
- run the stack_spider to crawl questions from stackoverflow.
- run static_cache.py in stack_analyser to do some statistis and data transfer.
- run the stack_analyser for final result.