Latent Dirichlet allocation and timeseries analysis for summarization of live TV shows via Twitter

Tweets volume change during talkshow Enikos (18/4/2016)	Top LDA topics during talkshow Anatropi (12/4/2016)

We provide the source code for our analysis, the pdf report and a Jupyter Notebook with both input and output. You can also watch the presentation here.

Read tweets from tweepy retrieved hashtag json/txt.
Load to pandas DataFrame and keep only relevant columns (tweets and timestamp in our case).
(Optional) Transform timestamp to local timezone (Athens time in our case, advise pytz to adjust).
Count tweet volume per minute.
Plot time series.
(Optional) Remove stopwords, find most frequent tokens (Greek stopwords in our case but every NLTK supported language works).
LDA Preprocessing, words occurring in only one document or in at least 95% of the documents are removed.
Document Term Matrix structure transform.
Obtain the words with high probabilities.
Obtain the feature names.
Print LDA topics, assign topics to each tweet.

sergegoussev/tv-show-summarization-twitter