This repo is about social media analytics using Neo4j with Twitter data. It contains step-by-step from importing twitter data, parsing data, analyzing data to visualizing them.
The data model is created in Neo4j Labs: arrows.app
We are using Twitter Premium v1.1 API. Therefore we need to have Twitter Developer Account and have premium access.
Since we are using Twitter API, we need to provide Authentication for API methods. In this case, Bearer Token is being used for the authentication.
We can get our Bearer Token from Twitter Developer Portal.
NOTE: keep the Bearer Token after it is generated
.
the url format is https://api.twitter.com/1.1/tweets/search/30day/twitter.json?<parameters>
methods : GET
check available parameters here
Let's see the response from API request first in Postman
url : https://api.twitter.com/1.1/tweets/search/30day/twitter.json?maxResults=100&query=g20%20indonesia%20lang:en&fromDate=202209280000&toDate=202209290000
Params :
- query :
g20%20indonesia%20lang:en
- fromDate :
202209280000
- toDate :
202209290000
- maxResult :
100
in here our keyword is g20 indonesia
we need several steps to import Twitter data to Neo4j. Starts with store static value, create constraints, until parsing data so we can load them.
The first thing to do is store static value in Neo4j, which is our Bearer Token.
CALL apoc.static.set("twitter.bearer", "<bearer_token>")
For checking stored-static value with prefix twitter
in Neo4j
RETURN apoc.static.getAll("twitter")
From our data model, we know there are 3 node labels and their primary key from response in Postman. So, we can create constraints using that informations.
Create constraints with type NODE KEY
for labels Tweet and User with their primary key as node property.
CREATE CONSTRAINT tweet_id IF NOT EXISTS FOR (n:Tweet) REQUIRE (n.conversation_id) IS NODE KEY;
CREATE CONSTRAINT user_id IF NOT EXISTS FOR (n:User) REQUIRE (n.user_id) IS NODE KEY;
WITH apoc.static.getAll("twitter") AS twitter
WITH "g20" as keyword, "202209280000" as fromDate, "202209290000" as toDate, twitter
WITH "https://api.twitter.com/1.1/tweets/search/30day/twitter.json?maxResults=100&query="+keyword+"&fromDate="+fromDate+"&toDate="+toDate as uri, twitter
CALL apoc.load.jsonParams(
uri,
{Authorization:"Bearer "+twitter.bearer},
null
)
YIELD value
UNWIND value.results as status
return status limit 5
the query result is
WITH apoc.static.getAll("twitter") AS twitter
WITH "g20%20indonesia" as keyword, "202209280000" as fromDate, "202209290000" as toDate, twitter
WITH "https://api.twitter.com/1.1/tweets/search/30day/twitter.json?maxResults=100&query="+keyword+"&fromDate="+fromDate+"&toDate="+toDate as uri, twitter
CALL apoc.load.jsonParams(
uri,
{Authorization:"Bearer "+twitter.bearer},
null
)
YIELD value
UNWIND value.results as status
MERGE (n:Tweet{conversation_id:status.id})
ON CREATE SET
n.tweet=status.text,
n.created_at=status.created_at,
n.retweet_count=status.retweet_count,
n.favorite_count=status.favorite_count
/// user post
MERGE (u:User{user_id:status.user.id })
ON CREATE SET
u.name= status.user.name,
u.screen_name=status.user.screen_name,
u.description=status.user.description,
u.followers_count=toInteger(status.user.followers_count),
u.following_count=toInteger(status.user.friends_count),
u.account_created_at=status.user.account_created_at,
u.favourites_count=toInteger(status.user.favourites_count),
u.verified=status.user.verified,
u.profile_image_url=status.user.profile_image_url
MERGE (u)-[:POST]->(n)
// Hashtags Tweet
FOREACH (i in status.entities.hashtags |
MERGE (hs:Hashtags{text:i.text})
MERGE (n)-[:TAG]->(hs)
)
///user mention
FOREACH (
i in status.entities.user_mentions |
MERGE (um:User{user_id:i.id})
ON CREATE SET
um.name=i.name,
um.screen_name=i.screen_name
MERGE (n)-[rm1:MENTIONS]->(um)
)
/// reply
FOREACH(
ignoreMe IN CASE WHEN status.in_reply_to_status_id is not null
THEN [1] ELSE [] END |
MERGE (urp:User{user_id:status.in_reply_to_user_id})
ON CREATE SET
urp.screen_name=status.in_reply_to_screen_name
MERGE (urp)-[:REPLY]->(n)
)
// Retweet
FOREACH(
ignoreMe IN CASE WHEN status.retweeted_status.id is not null THEN [1] ELSE [] END |
MERGE (rt:Tweet{conversation_id:status.retweeted_status.id})
ON CREATE SET
rt.created_at=status.retweeted_status.created_at,
rt.tweet=status.retweeted_status.text,
rt.retweet_count=status.retweeted_status.retweet_count,
rt.favorite_count=status.retweeted_status.favorite_count
MERGE (n)-[:RETWEET]->(rt)
// Hashtags Retweet
FOREACH (i in status.retweeted_status.entities.hashtags |
MERGE (hs1:Hashtags{text:i.text})
MERGE (rt)-[:TAG]->(hs1)
)
// Retweet post
FOREACH(
ignoreMe IN CASE WHEN status.retweeted_status.user.id is not null THEN [1] ELSE [] END |
MERGE (u_rt_po:User{user_id:status.retweeted_status.user.id })
ON CREATE SET
u_rt_po.name= status.retweeted_status.user.name,
u_rt_po.screen_name=status.retweeted_status.user.screen_name,
u_rt_po.description=status.retweeted_status.user.description,
u_rt_po.followers_count=toInteger(status.retweeted_status.user.followers_count),
u_rt_po.following_count=toInteger(status.retweeted_status.user.friends_count),
u_rt_po.account_created_at=status.retweeted_status.user.account_created_at,
u_rt_po.favourites_count=toInteger(status.retweeted_status.user.favourites_count),
u_rt_po.verified=status.retweeted_status.user.verified,
u_rt_po.profile_image_url=status.retweeted_status.user.profile_image_url
MERGE (u_rt_po)-[:POST]->(rt)
)
// retweet mention
FOREACH (
i in status.retweeted_status.user_mentions |
MERGE (u_rt_mt:User{user_id:i.id})
ON CREATE SET u_rt_mt.name=i.name,
u_rt_mt.screen_name=i.screen_name
MERGE (rt)-[:MENTIONS]->(u_rt_mt)
)
// retweet repply
FOREACH(
ignoreMe IN CASE WHEN status.retweeted_status.in_reply_to_status_id is not null THEN [1] ELSE [] END |
MERGE (u_rt_rp:User{user_id:status.retweeted_status.in_reply_to_user_id})
ON CREATE SET
u_rt_rp.screen_name=status.retweeted_status.in_reply_to_screen_name
MERGE (u_rt_rp)-[:REPLY]->(n)
)
)
After the data is loaded, we can check the schema
CALL db.schema.visualization()
and the data in Neo4j,
finally we can start to analyze the data ...
NOTE : the data is ONLY loaded from first page of the Twitter API response