/spikeFit

Columbia University Applied Math Senior Seminar Presentation: Generated Simple Linear and Trinomial Linear Regressions for hourly hashtag Twitter mentions

Primary LanguageMATLAB

For our Applied Mathematics Senior Seminar Presentation, we (Ezra Kebrab and Tristan Renaud) modeled regressions for Twitter trends - specifically, hourly hashtag mentions for three events in 2009.

Our objective was to see if previous data spikes could be modeled so that Twitter and other data storage-intensive sites could anticipate and better prepare for future data spikes. We also explored applications of this model for other purposes including marketing, macroeconomic predictions, portfolio/risk management, and server management.

We used the listed awk commands to filter out specific hashtag mentions from the dataset. spikePlot.m plots the hashtag mentions for each of these events. spikeFilter.m creates and graphs the simple linear and trinomial linear regressions for each event.

Selected events were: Michael Jackson's death, Kanye West's interruption of Taylor Swift at the 2009 VMAs, also known as the "Ima let you finish" incident, and the balloon boy incident.

Raw data was pulled from Infochimps.

Presentation Slides can be found here: http://slidesha.re/12ktygr

Matlab code and AWK commands used can be found here: https://github.com/ekebrab/spikeFit