/twitter_download

Download scripts for distributing twitter data.

Primary LanguagePythonMIT LicenseMIT

==================== Semeval Twitter data download script + user info

Update: Includes script to run user information. The download_tweets_user_api.py file is to be used to download the tweets and user information for SEMEVAL 2016: Sentiment Analysis in Twitter. If you are not interested in using user information, you can still use download_tweets_api.py, or exclude the "--user" option of this file. This file is an initial template for basic user information. The use of user information can be changed to include or exclude more user data. Feel free to manipulate this file to download additional user information if so desired (e.g. you can check if any of the other users in the dataset are friends)

Prerequisites:

Please read and follow the instructions below for download_tweets_api.py first using this script instead.

Usage:

python download_tweets_user_api.py --dist input.txt --output output.txt --user

Output Format:

*_semeval_tweets.txt: tweet id \t topic \t tweet text

*_semeval_userinfo.txt: tweet id \t user id \t follower count \t status count \t description \t friend count \t location \t language \t name \t time zone

==================== Semeval Twitter data download script (Original)

For downloading tweets distributed using IDs to protect privacy. Uses the format of the Semeval Twitter sentiment analysis dataset

Prerequisites:

sixohsix/twitter

easy_install twitter

Usage:

The first time you run this, it should open up a web browser, have you log into twitter, and show a PIN number for you to enter into a prompt generated by the script.

  1. Login to Twitter with your user name in your default browser.
  2. Run the script like this to download your credentials: python download_tweets_api.py --dist=tweeti-a.dist.tsv
  3. Download tweets like so:
python download_tweets_api.py --dist=tweeti-a.dist.tsv --output=downloaded.tsv

-Note that it takes about 18 hours to download the Semeval sentiment analysis training dataset.

Restarting after a partial download:

In case the script hangs in the middle of the download for whatever reason, use the --partial argument to specify the file containing partially downloaded results.
This way you won't have to start from scratch again:

python download_tweets_api.py --dist=tweeti-a.dist.tsv --partial=downloaded.tsv --output=downloaded2.tsv

Task A Mention Test Script

To print out the mentions and annotations from task A you can use the testIndices.py script like so:

python testIndices.py downloaded.tsv

This just prints out the mentions with sentiment annotations for easier inspection.

Notes:

  • You may need to manually change the link that is printed out for authorization to use https:// instead of http://
  • The time on your computer needs to be set accurately. Thanks to Canberk for noting this on the email list.