Polling the 2016 Presidential Race with Twitter and Natural Language Processing: candidates.drewatkinson.me
A.
- The map of states takes all positive tweets that were geocoded, and takes the average share of positive tweets about each candidate to determine a rough "vote share" for every state.
- The popularity graph charts the popularity of each candidate, counted by number of tweets per hour in 5 minute incriments.
- The sentiment grap charts the percentage of tweets about each candidate that are positive, calculated every 5 minutes by a Natural Language Classifier (Naive Bayes).
A. I wanted to use my knowledge of Node.js and Express, but also learn to incorporate an SQL Database. I also have been getting more interested in Natural Language Processing and Data Science. The natural intersection of all of these things was a project involving the presidential election. I chose twitter as the source of the data because I think that traditional phone polls are going to, eventually, become less reliable as less people have landline phones. I think polling of general public opinion online is an important step in the political process that we need to figure out, and work has already started with projects like BeHeardPhilly.
A. Probably, not at all. Here's what needs improved:
- The Naive Bayes model that classifies tweets as positive or negative is trained by a very small sample taken from a single day.
- The data should be averaged or fit to a regression to give a better overview of these metrics over larger samples of time.
- A very small fraction of tweets are encoded with geographic coordinates (less than 1%), so there is a very small sample size to work with.
- In addition to the above, the state that the tweet is in is calculated by the shortest distance to the average coordinates of each state. This could be improved easily with a geocoding service or taking the state's entire area in to account.
- Leaflet Javascript maps
- Chartist SVG charts
- Moment Time library
- Lodash
- jQuery
- Boostrap
- Font Awesome v4
- Github Corners by Tim Holman
- Node.js v6.3
- PostgreSQL and pg-promise
- Async
- Express web framework and it's generator
- Natural Natural Language Processing library for Nodejs
- SASS preprocessor and its express middleware
- Pug templating library
- Twitter Node library
- Twitter Streaming API
- geoJSON data from Mike Bostock through Leaflet
- Average latitude and longitude for US states from Maxmind
- OpenStreetMap map data
- Mapbox map imagery
vitaly-t (Author of pg-promise) for the pull requests and helping me understand PostgreSQL