A final project for CS 108: Ethics of Intelligent Systems at Harvard.
Python
Usage
Get Twitter API keys.
Create a file called APIKeys.json, and store your API keys in there. You can use APIkeyexample.txt as a reference.
Note that this .json will not be pushed to git, unless you change the .gitignore.
Generate tweets for a user or set of users
Navigate to the src directory
Run python main.py --names <NAME1> <NAME2> ... where each of the NAMEi can be replaced with a twitter handle.
The code will pull tweets and save them to the data directory
This will also print generated tweets to the console
Determine sentence similarity
Navigate to the src directory
Run python model_test.py <tweet_file> <K>, where <tweet_file> is the relative path to a file in the data folder (for example, ../data/Harvard.csv), and K designates how big your K-mer will be. K must be at least 2.
Important files
main.py: Contains code to generate sentences given a list of Twitter handles at the command line.
model_generator.py: Contains functions to generate the Markov model for a user. This includes getting tweets from a file, extracting K-mers, forming the model, and determining next words given the current K-1 words.
model_test.py: Contains functions generate sentences from a model, and test their similarity to the original tweets. Note that when run as driver program, this file will default to determining sentence accuracy.
twitter_extractor.py: Contains functions to connect to Twitter API and extract tweets for user or users.
comparison.py: Contains functions to compare words/sentences for quantitative analysis.