/politi-troll

Twitter Political troll classifier built in R

Primary LanguageR

PolitiTroll

Advanced Data Science

"Twitter Project"

If you would like to rerun the analysis you can run file 8 or alternatively re-run the lab_rmd.rmd file in the ~/reports subdirectory

If you yourself would like to collect trolls (using the twitter api) you must place a file called hidden.r containing your Twitter API keys for the TwitteR API wrapper to work.

Most of the magic happens in the ~/rscripts subdirectory, here is an overview:

0_hidden.R

This file contains our API keys and authenticates our session for us.

1_get_tweets_now_func.R

This file defines several functions, most importantly get_those_tweets_meow() which searches the Twitter API for tweets matching a particular string and collects information about the user who sent the tweet.

2_selected_trolls.R

This file collects the same information as 1_get_tweets_now_func.R, but for pre-identified users.

3_gettingdata.R

This file utilizes functions from files 1 and 2 to collect data. You may specify which strings you may be looking for (eg. 'pizza hut isn't that good but what else am I going to eat in toledo'). A file "outputs/classifyme_api.csv" will be created where a human can read the tweets and determine if the user is a troll or not. After saving the file with a troll column (with 1 being troll, 0 being non troll), the script will bind this troll column to the rest of the data.

4_hand_harvesting_trolls.R

This file serves the same function as file 3, but for pre-selected users (users you'd like to add of which you already have their username). A file for classification is created (classify, 0 or 1 for troll status) in ~/outputs/classifyme_handtroll.csv . the script binds troll status to the remainder of data and merges these data with the data from file 3 to crease a file ~/data/data.csv which contains all relevant data for our project.

6_emoji.r

This file scrapes and emoji database from the web.

7_Editing.R

This file preprocesses the data to meaningful information for the analysis and visualization.

8_visuals.R

this file runs visualization and analysis from EDA to models.