TwitterMetrics

This repository contains a Python script which can be used to scrape my twitter feed for all tweets. The resulting data is saved in JSONLine format on a file on the disk, this is done so that the data can be re-analysed rapidly without worrying about Twiiter API restrictions (such as rate-limits). Specific fields of each tweet are then extracted (created_at, text), the timestamp is corrected to show local time (Amsterdam) and we only look for tweets which contain specific terms, each term can be thought of as an event. Every tweet which contains a specific term (event) is then placed in a list of lists, where the date and time the tweet (event) was created (occured) are stored. Using this data we can see the timeseries history of when a specific term (event) was tweeted (occured), and from this we can also create a histogram showing the probability of an event occuring during a 24 hour period.

To scrape all data from a twitter feed, the script can be run via the command line: python3 scrapeTweets.py -u where the flag '-u' will tell the script to disregard any stored data on the disk and instead glean all the tweets from the feed now - thus giving the most up to date collection of tweets.

The a custom input file can be specified by the user also via the command line: python3 scrapeTweets.py -i <\inputfile>

the cmd line flags -u and -i are mutually exclusive - you may only generate new data or analyse existing data but you cannot do both in the same execution cycle.

By default (no flags) the script will look for the file "timeline_json.jsonl", and analyse the data within it.