/twitter-march2016-analysis

Exploratory data analysis of Twitter Stream from March 2016 using the Big data tool Apache-Spark with python and application of various machine learning, data mining concepts such as LDA, Network Analysis & Clustering on Microsoft Azure HDInsights

Primary LanguageJupyter Notebook

twitter-march2016-analysis

Exploratory data analysis of Twitter Stream from March 2016 using the Big data tool Apache-Spark with python and application of various machine learning, data mining concepts such as LDA, Network Analysis & Clustering.

I personally worked on the Network analysis and LDA.

Instructions

  • Please start with readme_main.pdf for the instructions to run the Jupyter Notebook specific to Spark for the Pre-processing and most of the analysis.

  • Please go through readme_local.md for the instructions to run the Jupyter Notebook specific to local machine which contains a couple of analytics based on the outputs of previous spark pre-processing.

We had to run it in two separate notebooks because of few limitations we are facing with Microsoft Azure HDInsights.