My personal data engineer project to make a microblogging like twitter.
- Imagine your team will make a microblogging like twitter. People can tweet, reply, like,
and retweet. Please,
a. Design a database with the relationship for microblogging use-case.
b. From the database provides the query for,
- Creating all tables
- Showing all tweets from specific user includes reply, like, retweet counts
- Showing specific tweet information, includes all replies, who like the tweet, and lo retweet the tweet. c. [optional] Create diagram if the company need to create data warehouse from existing database. You can use any stack you confident.
-
Based on the given use-case, we can design the following database schema:
a. User Table:
- user_id (Primary Key)
- username
- password
b. Tweet Table:
- tweet_id (Primary Key)
- user_id (Foreign Key referencing User table)
- content
- created_at
c. Reply Table:
- reply_id (Primary Key)
- tweet_id (Foreign Key referencing Tweet table)
- user_id (Foreign Key referencing User table)
- content
- created_at
d. Likes Table:
- like_id (Primary Key)
- tweet_id (Foreign Key referencing Tweet table)
- user_id (Foreign Key referencing User table)
- created_at
e. Retweet Table:
- retweet_id (Primary Key)
- tweet_id (Foreign Key referencing Tweet table)
- user_id (Foreign Key referencing User table)
- created_at
-
Query Examples:
a. Creating all tables:
CREATE TABLE User ( user_id INT PRIMARY KEY, username VARCHAR(255), email VARCHAR(255), password VARCHAR(255) ); CREATE TABLE Tweet ( tweet_id INT PRIMARY KEY, user_id INT, content VARCHAR(255), created_at TIMESTAMP, FOREIGN KEY (user_id) REFERENCES User(user_id) ); CREATE TABLE Reply ( reply_id INT PRIMARY KEY, tweet_id INT, user_id INT, content VARCHAR(255), created_at TIMESTAMP, FOREIGN KEY (tweet_id) REFERENCES Tweet(tweet_id), FOREIGN KEY (user_id) REFERENCES User(user_id) ); CREATE TABLE Likes ( like_id INT PRIMARY KEY, tweet_id INT, user_id INT, created_at TIMESTAMP, FOREIGN KEY (tweet_id) REFERENCES Tweet(tweet_id), FOREIGN KEY (user_id) REFERENCES User(user_id) ); CREATE TABLE Retweet ( retweet_id INT PRIMARY KEY, tweet_id INT, user_id INT, created_at TIMESTAMP, FOREIGN KEY (tweet_id) REFERENCES Tweet(tweet_id), FOREIGN KEY (user_id) REFERENCES User(user_id) );
b. Showing all tweets from a specific user includes reply, like, and retweet counts:
SELECT t.tweet_id, t.content, COUNT(r.reply_id) AS reply_count, COUNT(l.like_id) AS like_count, COUNT(rt.retweet_id) AS retweet_count FROM Tweet t LEFT JOIN Reply r ON t.tweet_id = r.tweet_id LEFT JOIN Likes l ON t.tweet_id = l.tweet_id LEFT JOIN Retweet rt ON t.tweet_id = rt.tweet_id WHERE t.user_id = <user_id> GROUP BY t.tweet_id, t.content;
c. Showing specific tweet information includes all replies, who liked the tweet, and who retweeted the tweet:
SELECT t.tweet_id, t.content, u.username AS user_liked, u2.username AS user_retweeted, r.content AS reply_content FROM Tweet t LEFT JOIN Likes l ON t.tweet_id = l.tweet_id LEFT JOIN User u ON l.user_id = u.user_id LEFT JOIN Retweet rt ON t.tweet_id = rt.tweet_id LEFT JOIN User u2 ON rt.user_id = u2.user_id LEFT JOIN Reply r ON t.tweet_id = r.tweet_id WHERE t.tweet_id = <tweet_id>;
-
Here is an example of a data warehouse diagram, for raw files, can be accessed here.
-
In the data warehouse diagram above, we have the following components:
- User Dim: This dimension table contains user-related information, such as user_id, username, and email.
- Tweet Dim: This dimension table contains tweet-related information, such as tweet_id, user_id (foreign key referencing User Dim), content, and created_at.
- Date Dim: This dimension table contains date-related information, such as date_id and date. This table can be used to analyze trends and time-based metrics.
- Reply Fact: This fact table captures the replies to tweets. It contains reply_id, tweet_id (foreign key referencing Tweet Dim), user_id (foreign key referencing User Dim), content, and created_at.
- Likes Fact: This fact table captures the likes on tweets. It contains like_id, tweet_id (foreign key referencing Tweet Dim), user_id (foreign key referencing User Dim), and created_at.
- Retweet Fact: This fact table captures the retweets of tweets. It contains retweet_id, tweet_id (foreign key referencing Tweet Dim), user_id (foreign key referencing User Dim), and created_at.
-
This data warehouse diagram represents the dimensional model for the microblogging use-case, allowing for efficient analysis and reporting on various metrics related to users, tweets, replies, likes, and retweets.