A case study performed on fictitious customer data from DVD Rental Co as a part of the Serious SQL course by Danny Ma.
- π¨ Problem Statement
- βοΈ Requirements
- π Exploratory Data Analysis
- π Data Joining
- π SQL Problem Solving
- π£ Final Outputs
The customer analytics team at DVD Rental Co has asked for our help with generating the necessary data points required to populate specific parts of this first-ever customer email campaign. Personalized customer emails based off marketing analytics is a winning formula for many digital companies, and this is exactly the initiative that the leadership team at DVD Rental Co has decided to tackle!
The marketing team have shared with us a draft of the email they wish to send to their customers. Take a look at the expected final product, before we dive into the specific requirements.
View Requirement Details
For each customer, we need to identify the top 2 categories
for each customer based off their past rental history. These top categories will drive marketing creative images as seen in the sci-fi and rom-com examples in the draft email.
View Requirement Details
The marketing team has also requested for the 3 most popular films
for each customerβs top 2 categories
.
There is a catch though - we cannot recommend a film which the customer has already viewed.
If there are less than 3 films available - marketing is happy to show at least 1 film.
π‘ Any customer which do not have any film recommendations for either category must be flagged out so the marketing team can exclude from the email campaign - this is of high importance!
View Requirement Details
The number of films watched by each customer in their top 2 categories is required as well as some specific insights.
How many total films have they watched in their top category? How many more films has the customer watched compared to the average DVD Rental Co customer? How does the customer rank in terms of the top X% compared to all other customers in this film category?
How many total films has the customer watched in this category? What proportion of each customerβs total films watched does this count make?
π‘ Note the specific rounding of the percentages with 0 decimal places!
View Requirement Details
Along with the top 2 categories, marketing has also requested top actor
film recommendations where up to 3 more films are included in the recommendations list as well as the count of films by the top actor.
We have been given guidance by marketing to choose the actors in alphabetical order should there be any ties - i.e. if the customer has seen 5 Brad Pitt films vs 5 George Clooney films - Brad Pitt will be chosen instead of George Clooney.
The same logical business rules apply - in addition any films that have already been recommended in the top 2 categories must not be included as actor recommendations.
If the customer doesnβt have at least 1 film recommendation - they also need to be flagged with a separate actor exclusion flag.
The primary step to any data analysis is to first understand the data. The folks at DVD Rental Co. have given us 7 tables to work with. We must be able to derive as much information from these individual tables before deciding what type of joins can be used to obtain a final table that'll help us get answers.
EDA Re-Cap
Data Mapping Journey
SNo | Start | End | Join On |
---|---|---|---|
Step 1 | rental |
inventory |
inventory_id |
Step 2 | inventory |
film |
film_id |
Step 3 | film |
film_category |
film_id |
Step 4 | film_category |
category |
category_id |
We lastly identified key columns and all the tables that will provide us with these details. It is natural to face a dilemma about what kind of join to use in order to retain only the data that is important to us. Is it going to be a left-join
or an inner-join
? Let's try to tackle these questions and truly find the purpose of our joins..
BASE TABLE RECAP
DROP TABLE IF EXISTS complete_joint_dataset;
CREATE TEMP TABLE complete_joint_dataset AS
SELECT
rental.customer_id,
inventory.film_id,
film.title,
film_category.category_id,
category.name AS category_name
FROM dvd_rentals.rental
INNER JOIN dvd_rentals.inventory
ON rental.inventory_id = inventory.inventory_id
INNER JOIN dvd_rentals.film
ON inventory.film_id = film.film_id
INNER JOIN dvd_rentals.film_category
ON film.film_id = film_category.film_id
INNER JOIN dvd_rentals.category
ON film_category.category_id = category.category_id;
SELECT * FROM complete_joint_dataset limit 2;
Customer_Id | Film_Id | Title | category_id | category |
---|---|---|---|---|
130 | 80 | BLANKET BEVERLY | 8 | Family |
459 | 333 | FREAKY POCUS | 12 | Music |
We are looking into customer_id = 1
rental insights to generate an e-mail template.
Top 2 Categories
Customer_Id | category_name | rental_count | category_rank |
---|---|---|---|
1 | Classics | 6 | 1 |
1 | Comedy | 5 | 2 |
Category Insights
First Category Insights
Customer_Id | category_name | rental_count | average_comparison | percentile |
---|---|---|---|---|
1 | Classics | 6 | 4 | 1 |
Second Category Insights
Customer_Id | category_name | rental_count | total_percentage |
---|---|---|---|
1 | Comedy | 5 | 16 |
Category Recommendations
Customer_Id | category_name | category_rank | film_id | title | rental_count | reco_rank |
---|---|---|---|---|---|---|
1 | Classics | 1 | 891 | Timberland Sky | 31 | 1 |
1 | Classics | 1 | 358 | Gilmore Boiled | 31 | 2 |
1 | Classics | 1 | 951 | Voyage Legally | 28 | 3 |
1 | Comedy | 2 | 1000 | Zorro Ark | 31 | 1 |
1 | Comedy | 2 | 127 | Cat Coneheads | 30 | 2 |
1 | Comedy | 2 | 638 | Operation Operation | 27 | 3 |
Top Actor
Customer Id | Actor Id | First Name | Last Name | Rental Count |
---|---|---|---|---|
1 | 37 | VAL | BOLGER | 6 |
Actor Recommendations
Customer_Id | first_name | last_name | rental_count | title | film_id | actor_id | reco_rank |
---|---|---|---|---|---|---|---|
1 | Val | Bolger | 6 | Primary Glass | 697 | 37 | 1 |
1 | Val | Bolger | 6 | Alaska Phantom | 12 | 37 | 2 |
1 | Val | Bolger | 6 | Metropolis Coma | 572 | 37 | 3 |
Drumroll please
Here's what's waiting for the customer!
Β© Akshaya Parthasarathy, 2021
For feedback, or if you just feel like saying Hi!