TwitterQuoteScraper is a command-line interface for scraping quotations on Twitter. You can either save the data through a local file, Google Sheet or Database.
Note. The following must be met for a tweet to be a quotation:
- tweet must not be a retweet or a reply
- must not contain URL, media (image or video) or any emoji
- must match the regular expression:
^[\"\']{0,1}(?P<phrase>[A-Z].*[\.!?])[\"\']{0,1}\s*?[-~]\s*(?P<author>.*)$
- Python 3.6
- Pipenv
- Twitter API Keys and Tokens
- Google service account (required only if the
google_sheet
command is used) - this blog has simplified outlines on getting one
-
Download and extract the zip file or use Git to clone this repository.
-
Inside the directory open a terminal and run:
pipenv install
Important. Don't forget to activate the virtual environment first, doing this will put the virtual environment-specific python and PIP executables into pipenv shell’s PATH
. To activate, run:
pipenv shell
# Single account
python app.py --twitter-creds creds/twitter.json local_file --twitter-handles @prog_quotes
# Multiple accounts
python app.py --twitter-creds creds/twitter.json local_file --twitter-handles @prog_quotes @CodeWisdom
# Specify the folder where the files will be generated
python app.py --twitter-creds creds/twitter.json local_file --twitter-handles @prog_quotes --output-folder quotes/
# Override the default file type
python app.py --twitter-creds creds/twitter.json local_file --twitter-handles @prog_quotes --file-type json
The Database and each Twitter handle's table will be created if it doesn't exist.
# Single account
python app.py --twitter-creds creds/twitter.json database --twitter-handles @prog_quotes --database-configs creds/database.json
# Multiple accounts
python app.py --twitter-creds creds/twitter.json database --twitter-handles @prog_quotes @CodeWisdom --database-configs creds/database.json
# Ignore warnings
python -W ignore app.py --twitter-creds creds/twitter.json database --twitter-handles @prog_quotes --database-configs creds/database.json
Before you run the command, you must set up a Google spreadsheet for the service account to programmatically insert and edit values.
- Log in to Google and create a spreadsheet.
- Share the spreadsheet with the
client_email
you'll find inside the Google service account's JSON file.
python app.py --twitter-creds creds/twitter.json google_sheet --service-account creds/google.json --spreadsheet-id 1S8xsN8D6nD2KM5LoSZOIFnuw3zvP4XWRZLHMMfbsbPk --twitter-handles @prog_quotes
# Alphabetically sort the second/phrase column
python app.py --twitter-creds creds/twitter.json google_sheet --service-account creds/google.json --spreadsheet-id 1S8xsN8D6nD2KM5LoSZOIFnuw3zvP4XWRZLHMMfbsbPk --twitter-handles @prog_quotes --sort '{"order": "asc", "column": 1}'
The command of this spreadsheet is set up to run every midnight UTC time via CRON job.
local_file Generate and save quotations to a file.
usage: python app.py --twitter-creds local_file --twitter-handles [--output-folder] [--file-type]
database Save quotations to MySQL database.
usage: python app.py --twitter-creds database --twitter-handles --database-configs
google_sheet Save quotations to Google spreadsheet.
usage: python app.py --twitter-creds google_sheet --service-account --spreadsheet-id --twitter-handles [--sort]
--twitter-creds Path to JSON file that contains your Twitter app's credentials, see creds/twitter.json
for the expected keys. This argument should take place before every command, think of this as a login form to access Twitter's API.
--twitter-handles List of Twitter handles to scrape (may or may not start with @ character).
commands: local_file
, database
, google_sheet
--output-folder The folder where the files will be generated; by default, its value is set to the current working directory.
commands: local_file
--file-type The file's format to be generated, choose either CSV
(default) or JSON
.
commands: local_file
--database-configs Path to database configurations' JSON file, see creds/database.json
for keys.
commands: database
--service-account Path to your Google service account's JSON file.
commands: google_sheet
--spreadsheet-id The spreadsheet's ID.
commands: google_sheet
--sort A JSON-formatted string that specifies how to sort the spreadsheet's values; by default, its value is set to:
{
"order": null, // expects the following values: null, "asc" or "desc"
"column": 0 // column's number where the sort should be applied to
}
commands: google_sheet
Distributed under the MIT license. See LICENSE for more information.
Herbert Verdida / @bertdida
Thanks to @Tobaloidee for making this project's logo.