README
There are two scripts designed to used together to create dynamic social networks from Twitter.com
- CollectTweets.r gathers tweets containing the same hashtag.
- CreateGraph.py makes social networks from csv of tweets.
filename: CollectTweets.r
description: collect tweets containing a user-specified "hash-tag".
website: http://www.russellshepherd.com/d/?q=blog/replacing-twapperkeeper-r
The hashtag variable contains the text of the hashtag (minus the actual pound sign) you want the API to search for, and max.results contains the number of tweets you'd like to pull (up to a max of 1500; see more about limitations at the website).
The results are exported in a .csv file with the following format:
[row number], publication date, tweet content, author name
filename: CreatGraph.py
Accepts a CSV from of Twitter data generated by CollectTweets.r, generates a 1-mode digraph in egdelist format
and a dynamic graph in GEXF format.
Calling the script:
This program accepts three arguments: the input filename, the output filename,
and a binary for dynamic networks (0 = no dynamic network, 1 = dynamic network).
Example:
python CreateGraph.py tweets.csv edgelist.txt 1
Input:
a csv file with three columns Date, Tweet content, username
Example:
"Fri, 08 Jan 2012, 13:22:45 +0000","RT@cnn @johndoe Protest in Egypt #jan25","username@twitter.com (Authorname)"
The "date" field is formatted as: DoW, dd MMM YYYY HH:MM:SS +0000 +0000 refers to the timezone The message field can contain any characters allowed by Twitter, as well as Retweets and Tweet-ats and Hash-tags. RT@ (or "RT @") represents a Retweet; @ a tweet-at; and # a hashtag.
The R script CollectTweets.r (see, also: http://www.russellshepherd.com/d/?q=blog/replacing-twapperkeeper-r) will collect Twitter data in this format automatically, for any given hashtag.
Output:
The script creates a 1-mode digraph of Twitter users where an edge exists if:
-
The author includes tweet-at to another user. A directed tie is create from the author to the mentioned user.
-
The author includes a retweet from the original author of a message. In this case, a directed tie is create from user mentioned in the re-tweet to the author.
For example, the tweet "RT@cnn @johndoe Protest in Egypt #jan25" by user1 would create two ties:
- From user1 to johndoe
- From cnn to user1
Two output files are generated:
- A static network (no time data) edgelist ending in .txt
- A dynamic network (nodes and edges have lifespans) in gexf format. See http://gexf.net for more info.