- Get Twitter API keys.
- Create a file called APIKeys.json, and store your API keys in there. You can use APIkeyexample.txt as a reference.
- Note that this .json will not be pushed to git, unless you change the .gitignore.
- Generate tweets for a user or set of users
- Navigate to the
src
directory
- Run
python main.py --names <NAME1> <NAME2> ...
where each of the NAMEi
can be replaced with a twitter handle.
- The code will pull tweets and save them to the
data
directory
- This will also print generated tweets to the console
- Determine sentence similarity
- Navigate to the
src
directory
- Run
python model_test.py <tweet_file> <K>
, where <tweet_file>
is the relative path to a file in the data folder (for example, ../data/Harvard.csv), and K
designates how big your K-mer will be. K must be at least 2.
- main.py: Contains code to generate sentences given a list of Twitter handles at the command line.
- model_generator.py: Contains functions to generate the Markov model for a user. This includes getting tweets from a file, extracting K-mers, forming the model, and determining next words given the current K-1 words.
- model_test.py: Contains functions generate sentences from a model, and test their similarity to the original tweets. Note that when run as driver program, this file will default to determining sentence accuracy.
- twitter_extractor.py: Contains functions to connect to Twitter API and extract tweets for user or users.
- comparison.py: Contains functions to compare words/sentences for quantitative analysis.
numpy, scipy, Tweepy, NLTK