Exploratory data analysis of Twitter Stream from March 2016 using the Big data tool Apache-Spark with python and application of various machine learning, data mining concepts such as LDA, Network Analysis & Clustering.
I personally worked on the Network analysis and LDA.
-
Please start with readme_main.pdf for the instructions to run the Jupyter Notebook specific to Spark for the Pre-processing and most of the analysis.
-
Please go through readme_local.md for the instructions to run the Jupyter Notebook specific to local machine which contains a couple of analytics based on the outputs of previous spark pre-processing.
We had to run it in two separate notebooks because of few limitations we are facing with Microsoft Azure HDInsights.