pip install lxml pyquery elasticsearch
sudo sysctl -w vm.max_map_count=262144
docker-compose up -d
$ get-tweet.py --help
usage: get-tweet.py [-h] -f FIRSTDATE -l LASTDATE -t TEXT [-c COUNT] [-e EXPORTHOST] [-p EXPORTPORT]
Get tweet and import Elasticsearch.
optional arguments:
-h, --help show this help message and exit
-f FIRSTDATE, --firstdate FIRSTDATE <Require> First tweet date "YYYY-MM-DD".
-l LASTDATE, --lastdate LASTDATE <Require> Last tweet date "YYYY-MM-DD".
-t TEXT, --text TEXT <Require> Search text "xxxxx".
-c COUNT, --count COUNT <Option> Maximum number to collect "N", default 300.
-e EXPORTHOST, --exporthost EXPORTHOST <Option> Destination elasticsearch host, default localhost.
-p EXPORTPORT, --exportport EXPORTPORT <Option> Destination elasticsearch port, default 9200.
Original README.md
A project written in Python to get old tweets, it bypass some limitations of Twitter Official API.
Twitter Official API has the bother limitation of time constraints, you can't get older tweets than a week. Some tools provide access to older tweets but in the most of them you have to spend some money before. I was searching other tools to do this job but I didn't found it, so after analyze how Twitter Search through browser works I understand its flow. Basically when you enter on Twitter page a scroll loader starts, if you scroll down you start to get more and more tweets, all through calls to a JSON provider. After mimic we get the best advantage of Twitter Search on browsers, it can search the deepest oldest tweets.
This package assumes using Python 2.x. The Python3 "got3" folder is maintained as experimental and is not officially supported.
Expected package dependencies are listed in the "requirements.txt" file for PIP, you need to run the following command to get dependencies:
pip install -r requirements.txt
-
Tweet: Model class to give some informations about a specific tweet.
- id (str)
- permalink (str)
- username (str)
- text (str)
- date (date)
- retweets (int)
- favorites (int)
- mentions (str)
- hashtags (str)
- geo (str)
-
TweetManager: A manager class to help getting tweets in Tweet's model.
- getTweets (TwitterCriteria): Return the list of tweets retrieved by using an instance of TwitterCriteria.
-
TwitterCriteria: A collection of search parameters to be used together with TweetManager.
- setUsername (str): An optional specific username from a twitter account. Without "@".
- setSince (str. "yyyy-mm-dd"): A lower bound date to restrict search.
- setUntil (str. "yyyy-mm-dd"): An upper bound date to restrist search.
- setQuerySearch (str): A query text to be matched.
- setTopTweets (bool): If True only the Top Tweets will be retrieved.
- setNear(str): A reference location area from where tweets were generated.
- setWithin (str): A distance radius from "near" location (e.g. 15mi).
- setMaxTweets (int): The maximum number of tweets to be retrieved. If this number is unsetted or lower than 1 all possible tweets will be retrieved.
-
Main: Examples of how to use.
-
Exporter: Export tweets to a csv file named "output_got.csv".
- Get tweets by username
tweetCriteria = got.manager.TweetCriteria().setUsername('barackobama').setMaxTweets(1)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
print tweet.text
- Get tweets by query search
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('europe refugees').setSince("2015-05-01").setUntil("2015-09-30").setMaxTweets(1)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
print tweet.text
- Get tweets by username and bound dates
tweetCriteria = got.manager.TweetCriteria().setUsername("barackobama").setSince("2015-09-10").setUntil("2015-09-12").setMaxTweets(1)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
print tweet.text
- Get the last 10 top tweets by username
tweetCriteria = got.manager.TweetCriteria().setUsername("barackobama").setTopTweets(True).setMaxTweets(10)
# first one
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
print tweet.text
- Get help use
python Exporter.py -h
- Get tweets by username
python Exporter.py --username "barackobama" --maxtweets 1
- Get tweets by query search
python Exporter.py --querysearch "europe refugees" --maxtweets 1
- Get tweets by username and bound dates
python Exporter.py --username "barackobama" --since 2015-09-10 --until 2015-09-12 --maxtweets 1
- Get the last 10 top tweets by username
python Exporter.py --username "barackobama" --maxtweets 10 --toptweets