/twitter_download

Download scripts for distributing twitter data.

Primary LanguagePythonMIT LicenseMIT

Semeval Twitter data download script

For downloading tweets distributed using IDs to protect privacy. Uses the format of the Semeval Twitter sentiment analysis dataset

Prerequisites:

sixohsix/twitter tqdm/tqdm

easy_install twitter
easy_install tqdm

Usage:

The first time you run this, it should open up a web browser, have you log into twitter, and show a PIN number for you to enter into a prompt generated by the script.

  1. Login to Twitter with your user name in your default browser.
  2. Run the script like this to download your credentials: python download_tweets_api.py --dist=tweeti-a.dist.tsv
  3. Download tweets like so:
python download_tweets_api.py --dist=tweeti-a.dist.tsv --output=downloaded.tsv

-Note that it takes about 18 hours to download the Semeval sentiment analysis training dataset.

Restarting after a partial download:

In case the script hangs in the middle of the download for whatever reason, use the --partial argument to specify the file containing partially downloaded results.
This way you won't have to start from scratch again:

python download_tweets_api.py --dist=tweeti-a.dist.tsv --partial=downloaded.tsv --output=downloaded2.tsv

Task A Mention Test Script

To print out the mentions and annotations from task A you can use the testIndices.py script like so:

python testIndices.py downloaded.tsv

This just prints out the mentions with sentiment annotations for easier inspection.

Notes:

  • You may need to manually change the link that is printed out for authorization to use https:// instead of http://
  • The time on your computer needs to be set accurately. Thanks to Canberk for noting this on the email list.