/StateOfTheUnion

A tweet & speech analysis which extracts specific information and patterns from two text sources: a collection of tweets and historical State of the Union speeches;

Project Description

Provide all R code and solutions by knitting your final RStudio file into a single file.

  1. Using the tweets.csv data that is available on the GitHub site, provide code to do the following
  • Identify all tweets with the word ‘flight’ in them
  • How many tweets end in a question mark?
  • How many tweets have airport codes in them (assume any three subsequent capital letters are airport codes)
  • Identify all tweets with URLs in them
  • Replace all instances of repeated exclamation points with a single exclamation point
  • Replace consecutive exclamation points, question marks, and periods with a single period, split the tweet on periods, and create a list where each element is a vector of the split strings from each tweet
  1. You now have the fundamental R tools to complete this exercise, but you will may still have to explore new techniques and packages. You will work with the full text of the State of the Union speeches from 1790 until 2012. The speeches are all in the file stateoftheunion1790-2012.txt on the GitHub site. Read the text into R and manipulate it in order to create a data frame with the following summaries for each speech:
  • the President’s name who gave the speech
  • the year the speech was given
  • the month the speech was given
  • day of the week the speech was given
  • the number of sentences in the speech
  • the number of words in the speech