ckoepp/TwitterSearch

Collect tweets from 03-20 Aug, 2014 for a particular location

Closed this issue · 4 comments

Hello,

I am a PhD student. I am new to python and using TwitterSearch to collect tweets from 03-20 Aug for a particular location (geocode). I am trying to run the following piece of code.

try:

tso = TwitterSearchOrder() 
tso.set_keywords(['protest','protested','protesting','riot','rioted',
                  'rioting',
                  'rally',
                  'rallied',
                  'rallying',
                  'marched',
                  'marching',
                  'strike',
                  'striked',
                  'striking']) 
tso.set_language('en') # we want to see German tweets only
tso.set_geocode (37.00,-92.00,100)
tso.set_include_entities(False) # and don't give us all those entity information
tso.set_since_id (367020906049835008)
tso.set_max_id (501488707195924481)
ts = TwitterSearch(
    consumer_key = '.............',                             # my access credentials 
    consumer_secret = '.........................',
    access_token = '...................',
    access_token_secret = '.......................'
 )

for tweet in ts.search_tweets_iterable(tso):
    user=tweet['user']['screen_name'].encode("ASCII", errors='ignore')
    text=tweet['text'].encode("ASCII", errors='ignore')
    time=tweet['created_at'].encode("ASCII", errors='ignore')
    print ( '@%s tweeted: %s on %s' % ( user, text, time ) + '\n')

except TwitterSearchException as e:
print(e)

However, above piece of code is not returning anything. Please help me in this regard. Can I collect old tweets without any keyword for a particular location?

Thanks in advance.

Regards,
Mohammed

Hi Mohammed,

your issue is addressing several problems actually...

First of all, you need to use tso.set_keywords("protest OR rally OR strike"). Entering those keyword without this magical OR keyword does the very same as a google search request: the API will search for tweets containing all those keywords. Also note that if your requests are too complex Twitter will refuse to answer them. This is not much of a problem usually, as you might be able to just perform different queries. In your particular case it's even more simple, as the whole search is based on sub-strings and asking for phrases like protest will also give you tweets containing protested etc.

Second, the Twitter API dates back to a maximum of 10 days. Only the Library of Congress in the States and commercial services are currently offering an (un)limited access to older tweets. More details about this issue are explained in the Twitter documentation. Don't get confused if you see more results by using the official Twitter Search Website - this particular service isn't accessing the Twitter Search API but a private interface only usable for Twitter internal services.

The last issue is regarding your encoding. You'll loose a lot of information by just encoding everything in ASCII - remember that tweets are very much able to use unicode. Update to python 3 and you'll get unicode-encoded strings by default.

Hope those information helped a bit. If you have more questions just ask them here. I'll leave this issue open if you need more help with TwitterSearch or the Twitter API.

Cheers,
Chris

Hi,

Thanks for your help Chris. I am trying this modified code and getting some errors. Could you please help me to fix it. I have another question...Approx. how many old tweets I will get after the search?

try:

tso = TwitterSearchOrder() # create a TwitterSearchOrder object
#tso.set_keywords(['ferguson'])

tso.set_keywords(["protest OR riot OR rally OR marched OR strike OR demonstration"]) 

tso.set_language('en')

tso.set_geocode (51.5072,-0.1275,400)
tso.set_include_entities(False) 

ts = TwitterSearch(
    consumer_key = '.....................',
    consumer_secret = '.......................................',
    access_token = '.....................',
    access_token_secret = '............................'
 )

counter=0

sleep_at=123

sleep_for=60

todo=True
next_max_id=0
while (todo):
    response=ts.search_tweets (tso)
    todo=not len (response['content']['statuses'])==0

    for tweet in response['content']['statuses']:
        tweet_id=tweet['id']
        tweet_text=tweet['text'].encode("UTF-8", errors='ignore')
        creation_time=tweet['created_at']
        geo_info=tweet['coordinates']['coordinates']
        print ('%i \t %s \t %s \t %s'%(tweet_id,tweet_text,geo_info,creation_time))
        if (tweet_id<next_max_id) or (next_max_id==0):
            next_max_id=tweet_id
            next_max_id-=1
    tso.set_max_id(next_max_id)

except TwitterSearchException as e: # take care of all those ugly errors if there are some
print(e)

Thanks in advance.

Regards,
Mohammed

How many tweets you receive is not predictable. The only thing that Twitter guarantees is that you'll see tweets from the last 7 days (dating back to a maximum of around 10 days - this very much depends on the current load of the Twitter). Further details are explained in the Twitter documentation. If you're doing your PhD you should actually read this at the very beginning as those limitations might change your whole approach towards the information gathering, right?

Also, note that giving a lat,long coordinate excludes every tweet without such meta-data attached to it. The vast majority of tweets are without geo-location information. Again, this is discussed in the Twitter API documentation.

About your code snippet: I cannot debug your program for you, sorry :(
I might be able to help you if you ask me a more precise question. What exactly is not working? Which exceptions do you get? What do you want the code to do exactly?

Closing because there was no further response - question seems to be answered and/or OP has moved on.