This scraper allows you to scrape tweets mentioning a particular term and storing them in a csv file.
First, install the tweepy library
pip install tweepy
Next, either:
- Click the "Clone or Download" button and download the zip file. Then extract simple_scraper.py from the zip file.
- Install git and clone this repository by entering the following command in a terminal:
git clone https://github.com/srisi/Simple-Twitter-Scraper.git
Finally, To get these scripts working, you need to jump through some hoops to get twitter api access. Namely, you need the following:
- Consumer Key
- Consumer Secret
- Access Token
- Access Token Secret
To obtain these keys, do the following:
-
Create a twitter account
-
Go to https://apps.twitter.com/
Click: "Create an application"
Here, you can enter placeholder values or whatever you want, e.g.:
Name: asdf846
Description: asdfasdfasdf
Website: http://www.asdf.comClick: "Create your Twitter application"
In the following screen, go to the tab "Keys and Access TokensCopy the Consumer Key and Consumer Secret from "Application Settings" Copy the Access Token and Access Token Secret from "Your Access Token"
-
Fill in the following placeholder values at the top of simple_scraper.py with your keys:
CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET
Note: each of these keys have to be strings, so you need to surround your keys with quotation marks e.g.
CONSUMER_KEY = "OEitehociOGe069toeifDotea"
For a tutorial, read (and run) through the tutorial() function at the end of simple_scraper.py
To run the tutorial, open a terminal, move to the folder with simple_scraper.py and execute python simple_scraper.py
The easiest way to use simple_scraper.py is to modify the if __name__ == '__main__':
section at the end of the file and then run the scraper from a terminal using python simple_scraper.py
.
For example, to scrape 100 tweets mentioning @realdonaldtrump for the last 10 days and storing them to trump.csv, your main function would look like this:
if __name__=='__main__':
#tutorial() <- adding a # means the line is "commented out" and will not be run
tweets = scrape_term_by_day('@realdonaldtrump', start_date='2016-09-06', end_date = '2016-09-16', tweets_per_day=100)
store_tweets_to_csv(tweets, 'trump.csv')