The data is retrieved from Kaggle.com; it is titled “Political Social Media Posts” and consists of 5000 observations with 21 variables. We analyze this data set based on questions that involve natural language processing of one or more variables. Our ultimate goal is to analyze the sentiment of each tweet and see how it varies by different attributes such as audience, message, and political party. In addition, we did Regression and Classification with the label of controversiality to see if it can predict bias.
python notebook: https://github.com/wafer110/Python-NLP-Analyze_TextualData_on_Reddit_Comments/blob/master/%5Bwh%5D%20FinalProj_Coding.ipynb
Project Presentation: https://github.com/wafer110/Python-NLP-Analyze_TextualData_on_Reddit_Comments/blob/master/NLP%20on%20Reddit%20Comments.pdf
Project Report: https://github.com/wafer110/Python-NLP-Analyze_TextualData_on_Reddit_Comments/blob/master/Alex_Wafer_FinalReport.pdf