Data Mining Fall2016

Data Mining Project Fall 2016 for CS235

Full Report in final_report.pdf

Poster in poster.pdf

##Pacakages Installation Make sure you install all the requirements form requirements.txt using pip just run: pip install -r requirements.txt

After installation create a file called new.py and add below line:

import nltk

nltk.download()

save it and run it -- python new.py

It will take few minutes to install ntlk packages

##Files Description Following are the files and their function

stream.py --> uses twitter streaming API to get live tweets

python stream.py > data_live.json

Stores tweets in data_live.json
parse_data.py --> parses raw data and select necessary attributes

python parse_data.py

Stores parsed data in parsed_data.json
sentiment_module.py and sentiment_trained.py :

sentiment_module.py --> trains the classifiers using positive.txt and negative.txt. It can also save the trained classifier so that we don't have to train it again and again. For time being that code is commented out. It also has a custom classifier which uses those trained classifier and finds the sentiment of text with confidence value.

sentiment_trained.py --> same as sentiment_module.py but uses saved classifier. So this will not work if classifier are not already saved.
sentiment_eval.py :

Uses sentiment_module or sentiment_trained to compute sentiments of all the tweets.

python sentiment_eval.py

Output is stored in sentiment_output.json
count_sentiment.py :

Just calculates the count of positive and negative tweets for each candidate.

python count_sentiment.py

Just prints out the results
JSON and text files :

data_live.json --> Contains sample of raw tweets collected during debate parsed_data.json --> Contains parsed information of raw tweets sentiment_output.json --> Add sentiment value and confidence value to the parsed tweets positive.txt --> Contains positive reviews for training negative.txt --> Contains negative reviews for training

asriv003/Presidential-Debate-Tweets-Analysis

Data Mining Fall2016