Twitter stream + search API grabber handling various config options such as collecting only during specific time periods, or limiting the collection to some locations.
- Install dependencies:
sudo apt-get install mongodb-10gen
pip install -r requirements.txt
-
Copy config.json.example to config.json
-
Set your Twitter API key and generate the related Access Token
"twitter": {
"key": "<Consumer Key (API Key)>xxxxxxxxxxxxxxxxxxxxx",
"secret": "<Consumer Secret (API Secret)>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"oauth_token": "<Access Token>xxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"oauth_secret": "<Access Token Secret>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
-
Write down the list of desired keywords and @users as json array.
"keywords": [ "amour", "@medialab_scpo" ],
Note that there are two possibilities to filter further:
- geolocalisation mode: just add ``"geolocalisation": "Paris, France"` field to the config with the desired geographical boundaries or give in coordinates of the desired box as shown in the config example file
- time limited keywords mode, in order to filter on specific keywords during planned time period:
"time_limited_keywords": { "#m6": [ ["2014-05-01 16:00", "2014-05-08 16:05"], ["2014-05-08 16:00", "2014-05-08 16:05"], ["2014-05-15 16:00", "2014-05-08 16:05"], ["2014-05-22 16:00", "2014-05-08 16:05"] ], "bieber": [ ["2014-05-08 16:00", "2014-05-08 16:05"] ] },
-
Run with:
./gazouilloire/run.py
- Data is stored in your mongo, you can also export it easily with simple scripts such as those in the
bin
directory:
# To export a csv with the most useful fields:
bin/export_csv.py
# To export the whole text content of the tweets:
bin/export_all_text.py