/twitter-friend-of-friend

Create a database of twitter follower relationships to visualize with Gephi later

Primary LanguagePython

Twitter Friend of Friend

List of Followed Accounts on Twitter Retrieval Tool

I created this script to help visualize which Twitter accounts (target accounts) are being followed by a provided list of accounts (source accounts).

Use cases of the script could include e.g. visualizing the graph of your friends on Twitter.

Another use case might be getting data about which accounts are influencing accounts known for being affected by and sharing malicious content on Twitter (such as disinformation).

It is possible to visualize the output data with Gephi later.

This project was largely inspired by How to download and visualize your Twitter network by Steve Hedden.

It is however adapted to use the TwitterAPI package instead of Tweepy.

This script uses Twitter API v2.

Installation

Install virtual environment (Optional)

python3 -m venv virtualenv

source virtualenv/bin/activate

Get prerequisites

pip install -r requirements.txt

Provide TwitterAPI authentication

Rename the .env.sample file to .env and provide needed API authentication values inside that file.

How to use

To retrieve the list of followed accounts, run

python main.py SOURCE_FILE

where SOURCE_FILE is a file with a line-separated list of twitter user ids which you would like to analyze.

To help visualize the data, the script retrieves Twitter @username for all provided IDs.

This adds to the time needed to run the script, so you might provide usernames after each ID (separated by a comma, without the @) if you know them.

You do not have to provide Twitter @username for all records, mix and match is allowed.

Source file example

10001
10002,someTwitterUser
10003,anotherTwitterUser
...

Twitter v2 limits the retrieval of followed account's list to 15 requests per 15 minutes. To account for this, the script sleeps for 16 minutes after each 15 requests performed. Additionally, only 1000 followed accounts can be retrieved at once.

Therefore, expect long execution times for longer lists of Twitter users.

Data retrieved from Twitter API will be stored as output/result.csv in the following format.

source,target
someTwitterUser,SomeTwitterNewsOutlet
someTwitterUser,OtherTwitterUserThisUserIsFollowing
...

Additionaly, a version of the sourcefile with missing usernames added to it will be saved as output/source_accounts.csv

Optional flags

  • Use --usernames-only flag to only retrieve usernames for your follower list.

  • Use --estimate-only flag to only output how long the script is expected to run.

Visualizing data with Gephi

I am only a beginner with Gephi, however I will share some of the steps I learned to be useful for visualizing the results in Gephi.

First, import the result csv file as Edge table, with edge type set as Directed.

Second, switch to the Data Laboratory tab, and under Copy data to another column select the column Id and ask Gephi to copy its contents to column Label. This will allow us to display Twitter usernames in the graph.

Screen Shot 2023-02-28 at 17 35 46

Switch back to the Overview tab. If you have too many data points, you might want to filter out accounts with a small number of connections. I use the Topology/K-core filter.

Screen Shot 2023-02-28 at 17 37 43

Next we will need to generate Out-Degree to separate our initial user lists from the users who weren't in the original list and we retrieved them with this script. Switch the Filters tab over to the Statistics tab and next to Degree press the Run button.

Screen Shot 2023-02-28 at 17 55 39

Now we can use the value of Out-Degree to visually separate data under the Appearance tab. I selected a green color for nodes with Out-Degree equal to 0 (the accounts which we got from Twitter API)

Screen Shot 2023-02-28 at 18 20 05

We can also play with other values such as make the size of each node reflect it's In-Degree value (acocunts followed by more people will have larger circles).

Finally, run the Force Atlas 2 layout algorithm to make the data more readable.

Depending on your data, the result could look something like this. (Actual Twitter usernames have been obfuscated into hashes)

Gephi helps us view data in an interactive way. We can for example highlight a target account and see all the source accounts which are following it.

Screen Shot 2023-02-28 at 18 01 27

Visualizing data without Gephi

If you wish to visualize the results without the help of Gephi (e.g. with python libraries such as matplotlib), you may refer to the Steve Hadden article linked above.