/spark_analysis_of_public_data_from_askubuntu

This project focuses on analyzing the questions on askubuntu.com to find the most common topics asked about in order to better understand what areas of Ubuntu may need more attention for bug fixing and also what features might be good to add in future releases of Ubuntu. To do this, I analyzed public data from askubuntu.com using Azure HDInsights with Spark. Tags were the most useful. Word counting the titles and body text was less useful. Future research might try using a natural language parsing libraries such as NLTK to better identify topics asked about and also better identify what type of questions are asked for each topic. Disclaimer: I'm in no way affiliated with Ubuntu. This was done for personal learning.

Primary LanguagePythonMIT LicenseMIT

Stargazers