- A model build for Twitter Sentiment analysis, P file and H5 file included
- The Python 3 code to build the above model using Keras, Theano & Pandas
I was set the task to try and make a sentiment analysis model for Twitter for my dissertation even though the due by date is long past I still wanted to accomplish this task.
For my dissertation I was set the task to predict stock prices for a single company on the stock market using Twitter and sentiment analysis. I tried to see if you could predict if the stock market would go up or down given the sentiment of the Tweets.
This model uses a number of open source projects to work properly:
- Keras - A Machine Learning Neural Network front end framework
- Theano - A Machine Learning Neural Network back end framework (For Keras)
- Pandas - A Python dataframe library
- [Pickle] - A Python serilisation lib, I use this to serialise the Tokenier with all the word mappings in
- [RE] - Regular expressions are not used in the version but left the method in for removing hyperlinks, at and hashtag symbols
I used a Cuda on a Nvidia 960M it took about 4 hours/ish for 10 epochs of 900,000 Tweets, Every 100,000 Tweets testing using 10,000 Tweets is done.
This model is 79% accurate and has 0.41 loss. If you want to use remember to deseralise the Keras tokenier which is the *.p file the tokenizer ahs a 100,000 unique word count, the predict.py has an example. Here is a small example of the model output:
Tweet | Score |
---|---|
"fuck you, you are an asshole" | 0.3316856 |
"I love you" | 0.81485116 |
"yes! I cant wait to go" | 0.60499549 |
"I don't want to be here" | 0.11294857 |
"is the best person ever" | 0.74768722 |
Free Software, Hell Yeah!