Political Profiling using Feature Engineering and NLP

A Capstone Project

By : Chiranjeevi Mallavarapu, Ramya Mandava, Sabitri KC

Advisor : Ginger Holt , Facebook

Public surveys are predominantly used when forecasting election outcomes. While the approach has had significant successes, the surveys have had their failures as well, especially when it comes to accuracy and reliability. As a result, it becomes challenging for political parties to spend their campaign budgets in a manner that facilitates the growth of a favorable and verifiable public opinion. Consequently, it is critical that a more accurate methodology to predict election outcome is developed. In this paper, we present an evaluation of the impact of utilizing dynamic public data on predicting the outcome of elections. Our model yielded a 0.71 Pearson Correlation between derived features and percentage of votes predicted. Hence, we can conclude that candidates having affiliation with notable organizations and having participation in public events greatly increases their likelihood to win. Also, having Wikipedia presence with some notable key words like ’Congress’, ’Republican’ and ’House’, makes them stand out in their constituencies. Together these features explain 80 percent of the response variable, which in this case is percentage of votes the candidates get in their respective constituencies.