Forum Question Analyzer

This project utilizes Big Data techniques to analyze questions from Stack Exchange forum. Using the Stack Exchange Data Dump from archive.org, it predicts whether a question will receive an accepted answer.

Data Source

The data is obtained from the Stack Exchange Data Dump, available here. Our project uses the TeX forum data.

Setup

Download the Stack Exchange Data Dump from archive.org.
Extract the data dump into the tex.stackexchange.com folder.
Install Python 3.8 or higher.
Install Spark (3.5.0 recommended).
Install the required dependencies: pip install -r requirements.txt.
Run jupyter notebook and open analysis.ipynb, features.ipynb or statistics.ipynb to see the results of our analysis.

Results

Our model achieved an accuracy of 70.92% in predicting accepted answers.

Contributors

Krzysztof Mizgała
Julia Czerniecka
Wiktoria Gałdusińska
Jerzy Grunwald
Maciej Kosierb

Feel free to contribute and improve our project!

KMChris/bigdata

Forum Question Analyzer

Data Source

Setup

Results

Contributors