This project covers planning and creating a relational database in PostgreSQL for an online service system
The first step is to design an ER diagram to show entity and its relationships.
- Customer Table: Contains attributes to adequately capture customer information. There is a 1-n relationship between customer and orders table and a 1-1 relationship between customer and customer_loyalty table. Meaning, each customer can have multiple orders (and not be placed at once, explained later.) but there can only be one customer associated to an order. Also, each customer can be enrolled once in a points program to reward their loyalty.
- Employee Table: Contains attributes to adequately capture employee information. There is a 1-n relationship between customer and orders table. Meaning, each employee can have multiple orders but there can only be one employee associated to an order.
- Rating Table: Contains attributes to adequately describe a rating. Done to prevent vertical replication of text.
- Status Table: Contains attributes to adequately describe a status. Done to prevent vertical replication of text.
- Order Type Table: Contains attributes to adequately describe an order type. Done to prevent vertical replication of text and numbers.
- Orders Table: Contains attributes to adequately describe an order. There is a 1-n relationship between itself and 2 other tables (Customer & Employee). There is a 1-1 relationship between orders and order type table. Meaning, each order can only be of one such type. There is a 1-1 relationship between orders and rating. Meaning, each order can only be one rating submitted per order. There is a 1-1 relationship between orders and status. Meaning, there can only be one status assigned at any given point to an order.
- Customer Loyalty Table: Contains attributes to adequately describe a customers loyalty over time. There is a 1-1 relationship between customer and customer_loyalty table.
- Each data type is thoroughly thought after to reduce unnecessary storage space.
- Each data type enforces a strict set of rules to be followed.
The second step is to create the schema w.r.t the above ERD diagram. Here we specify constraints to inforce certain logic in our database.
CREATE TABLE Customer (
customer_id serial PRIMARY KEY
first_name varchar(20) NOT NULL,
last_name varchar(20) NOT NULL,
email_id varchar(50) NOT NULL,
phone_no varchar(15) NOT NULL,
address1 varchar(50) NOT NULL,
address2 varchar(50) NULL,
state varchar(20) NOT NULL,
pin_code varchar(10) NOT NULL,
customer_rating real NOT NULL,
last_login timestamp NOT NULL,
created_at timestamp NOT NULL,
updated_at timestamp NOT NULL,
UNIQUE(email_id,phone_no)
);
CREATE TABLE Rating (
rating_id serial NOT NULL PRIMARY KEY,
rating_desc varchar(10) NOT NULL UNIQUE
);
CREATE TABLE Status (
status_id serial NOT NULL PRIMARY KEY,
status varchar(10) NOT NULL UNIQUE
);
CREATE TABLE Employee (
employee_id serial NOT NULL PRIMARY KEY,
first_name varchar(20) NOT NULL,
last_name varchar(20) NOT NULL,
govt_id varchar(20) NOT NULL UNIQUE,
phone_no varchar(15) NOT NULL UNIQUE,
employee_rating real NOT NULL,
created_at timestamp NOT NULL,
updated_at timestamp NOT NULL
);
CREATE TABLE Order_Type (
orderType_id serial NOT NULL PRIMARY KEY,
orderType_desc text NOT NULL,
orderType_price integer NOT NULL
);
CREATE TABLE Orders (
order_id serial NOT NULL PRIMARY KEY,
customer_id int NOT NULL,
order_type int NOT NULL,
employee_id int NOT NULL,
feedback smallint NOT NULL,
status smallint NOT NULL,
created_at timestamp NOT NULL,
updated_at timestamp NOT NULL,
UNIQUE (order_id, customer_id, status),
CONSTRAINT fk_Order_customer_id FOREIGN KEY (customer_id) REFERENCES Customer (customer_id) ON DELETE CASCADE,
CONSTRAINT fk_Order_order_type FOREIGN KEY (order_type) REFERENCES Order_Type (orderType_id) ON DELETE SET NULL,
CONSTRAINT fk_Order_employee_id FOREIGN KEY (employee_id) REFERENCES Employee (employee_id) ON DELETE SET NULL,
CONSTRAINT fk_Order_feedback FOREIGN KEY (feedback) REFERENCES Rating (rating_id) ON DELETE SET NULL,
CONSTRAINT fk_Order_status FOREIGN KEY (status) REFERENCES Status (status_id) ON DELETE SET NULL
);
CREATE TABLE Customer_Loyalty (
row_id serial NOT NULL PRIMARY KEY,
customer_id int NOT NULL,
orders_placed int NOT NULL,
money_spent int NOT NULL,
points int NOT NULL,
CONSTRAINT fk_Customer_Loyalty_customer_id FOREIGN KEY (customer_id) REFERENCES Customer (customer_id) ON DELETE CASCADE
);
- Each constraint is set to inforce a rule to align with business logic. For example:
- In orders table, there is a unique constraint set on 2 columns to allow 1 order per customer is allowed to be in an active state.
- Once a customer deletes their profile, for data protection and privacy, all their order history and loyalty points are deleted.
Using chatgpt, I created mock data and ingested it into tables using python.
The data files (.csv) can be found in the repository.
import psycopg2
conn = psycopg2.connect(
host='localhost',
dbname= 'HomeServices',
user='postgres',
password='1234567',
port='5432'
)
cursor = conn.cursor()
import os
import pandas as pd
df = pd.read_csv(r"C:\xyz\customer.csv")
for row in df.itertuples():
cursor.execute("INSERT INTO public.customer(first_name,last_name,email_id,phone_no,address1,address2,state,pin_code,customer_rating,last_login,created_at,updated_at) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)",
(row.first_name,
row.last_name,
row.email_id,
row.phone_no,
row.address1,
row.address2,
row.state,
row.pin_code,
row.customer_rating,
row.last_login,
row.created_at,
row.updated_at))
conn.commit()
df = pd.read_csv(r"C:\xyz\employee.csv")
for row in df.itertuples():
cursor.execute("INSERT INTO public.employee(first_name,last_name,govt_id,phone_no,employee_rating,created_at,updated_at) VALUES (%s,%s,%s,%s,%s,%s,%s)",
(row.first_name,
row.last_name,
row.govt_id,
row.phone_no,
row.employee_rating,
row.created_at,
row.updated_at))
conn.commit()
df = pd.read_csv(r"C:\xyz\ordertype.csv")
for row in df.itertuples():
cursor.execute("INSERT INTO public.order_Type(orderType_desc,orderType_price) VALUES (%s,%s)",
(row.orderType_desc,
row.orderType_price))
conn.commit()
df = pd.read_csv(r"C:\xyz\rating.csv")
for row in df.itertuples():
cursor.execute(f"INSERT INTO public.rating(rating_desc) VALUES ('{row.rating_desc}')")
conn.commit()
df = pd.read_csv(r"C:\xyz\status.csv")
for row in df.itertuples():
cursor.execute(f"INSERT INTO public.status(status) VALUES ('{row.status}')")
conn.commit()
df = pd.read_csv(r"C:\xyz\orders.csv")
for row in df.itertuples():
cursor.execute("INSERT INTO public.orders(customer_id,order_type,employee_id,feedback,status,created_at,updated_at) VALUES (%s,%s,%s,%s,%s,%s,%s)",
(row.customer_id,
row.order_type,
row.employee_id,
row.feedback,
row.status,
row.created_at,
row.updated_at))
conn.commit()
cursor.close()
conn.close()
The table customer_loyalty should populate based on their order history, and points will be awareded based on that. The points calculation is as follows:
- For customers with total orders below 20 get 15% of total money spent as reward points.
- For customer with total orders below 50 get 20% of total money spent as reward points.
- For customer with total orders above 100 get 25% of total money spent as reward points.
The following stored procedure populates all the columns:
CREATE OR REPLACE PROCEDURE calculate_customer_loyalty()
LANGUAGE plpgsql
AS $$
DECLARE
customer_row RECORD;
pointer INTEGER;
total_orders INTEGER;
total_money_spent INTEGER;
reward_points INTEGER;
BEGIN
-- Loop through each customer
pointer = 1;
DELETE FROM customer_loyalty;
FOR customer_row IN SELECT customer_id FROM Customer LOOP
-- Calculate total orders placed by the customer
SELECT COUNT(order_id) INTO total_orders FROM Orders WHERE customer_id = customer_row.customer_id;
-- Calculate total money spent by the customer
SELECT SUM(orderType_price) INTO total_money_spent FROM Orders
INNER JOIN Order_Type ON Orders.order_type = Order_Type.orderType_id
WHERE customer_id = customer_row.customer_id;
-- Calculate reward points based on total orders and money spent
IF total_orders < 20 THEN
reward_points := total_money_spent * 0.15;
ELSIF total_orders < 50 THEN
reward_points := total_money_spent * 0.20;
ELSE
reward_points := total_money_spent * 0.25;
END IF;
-- Insert or update loyalty information for the customer
INSERT INTO Customer_Loyalty (row_id, customer_id, orders_placed, money_spent, points)
VALUES (pointer,customer_row.customer_id, total_orders, total_money_spent, reward_points);
pointer = pointer +1;
END LOOP;
END;
$$
CALL calculate_customer_loyalty();
-
Customer Table:
-
Employee Table:
-
Order Type Table:
-
Rating Table:
-
Status Table:
-
Orders Table:
-
Loyalty Table:
Turns out, customer_loyalty table has no date columns. Lets fix that. To do so, we need to ALTER and UPDATE the table.
ALTER TABLE customer_loyalty ADD COLUMN created_at TIMESTAMP;
UPDATE customer_loyalty SET created_at = NOW() WHERE created_at IS NULL RETURNING *;
Lets also modify the stored procedure to include the new column as well as only consider orders with status 'completed':
CREATE OR REPLACE PROCEDURE calculate_customer_loyalty()
LANGUAGE plpgsql
AS $$
DECLARE
customer_row RECORD;
total_orders INTEGER;
total_money_spent INTEGER;
reward_points INTEGER;
BEGIN
-- Loop through each customer
FOR customer_row IN SELECT customer_id FROM Customer LOOP
-- Calculate total orders placed by the customer with status 3
SELECT COUNT(order_id) INTO total_orders FROM Orders WHERE customer_id = customer_row.customer_id AND status = 3;
-- Calculate total money spent by the customer on orders with status 3
SELECT COALESCE(SUM(orderType_price), 0) INTO total_money_spent FROM Orders
INNER JOIN Order_Type ON Orders.order_type = Order_Type.orderType_id
WHERE customer_id = customer_row.customer_id AND status = 3;
-- Calculate reward points based on total orders and money spent
IF total_orders < 20 THEN
reward_points := total_money_spent * 0.15;
ELSIF total_orders < 50 THEN
reward_points := total_money_spent * 0.20;
ELSE
reward_points := total_money_spent * 0.25;
END IF;
-- Insert or update loyalty information for the customer
INSERT INTO Customer_Loyalty (customer_id, orders_placed, money_spent, points, created_at)
VALUES (customer_row.customer_id, total_orders, total_money_spent, reward_points, NOW())
ON CONFLICT (customer_id) DO UPDATE
SET orders_placed = EXCLUDED.orders_placed, money_spent = EXCLUDED.money_spent, points = EXCLUDED.points;
END LOOP;
END;
$$;
CALL calculate_customer_loyalty();
- If order is more than 1 day old, and if status is 1 (or Pending), move it to 2 (or Confirmed), if not done so.
- If order status is 2 (or Confirmed) and has stayed in that status for 7 days, move it to 3 (or Completed), if not done so.
- Once order is completed and feedback submitted, refresh employee's rating.
To implement the above logic (1,2), we need to use a procedure.
CREATE OR REPLACE PROCEDURE update_order_status()
LANGUAGE plpgsql
AS $$
DECLARE
order_row RECORD;
BEGIN
-- Move pending orders to confirmed if they are more than 1 day old
UPDATE Orders
SET status = 2, updated_at = NOW()
WHERE status = 1 AND (EXTRACT('day' FROM NOW() ) - EXTRACT('day' FROM created_at)) = 1;
-- Move confirmed orders to completed if they have stayed in that status for 7 days
UPDATE Orders
SET status = 3, updated_at = NOW()
WHERE status = 2 AND (EXTRACT('day' FROM NOW() ) - EXTRACT('day' FROM created_at)) = 7;
END;
$$;
CALL update_order_status();
To implement the above logic (3), we need to use a procedure.
CREATE OR REPLACE PROCEDURE update_employee_rating()
LANGUAGE plpgsql
AS $$
DECLARE
order_row RECORD;
v_employee INTEGER;
rating_sum INTEGER;
total_completed_orders INTEGER;
rating REAL;
BEGIN
FOR order_row IN SELECT order_id FROM orders LOOP
SELECT employee_id, SUM(feedback), COUNT(status)
INTO v_employee, rating_sum, total_completed_orders
FROM orders WHERE status = 3 AND order_id = order_row.order_id GROUP BY employee_id;
rating = rating_sum/total_completed_orders;
UPDATE employee SET employee_rating = rating WHERE employee_id = v_employee;
UPDATE employee SET updated_at = NOW() WHERE employee_id = v_employee;
END LOOP;
END;
$$;
CALL update_employee_rating();
- Order Table (with updated status)
- Employee Table (with updated rating)