The aim of this project is to clean the data and prepare it for business analysis.
- The dataset contains Food Delivery data from a restaurant in Melbourne, Australia. The restaurant has three branches around the CBD area. All three branches share the same menu but they have different management so they operate differently.
order_id
- A unique id for each order.date
- The date the order was made, given in YYYY-MM-DD format.time
- The time the order was made, given in hh:mm:ss format.order_type
- A categorical attribute representing the different types of orders namely: Breakfast, Lunch or Dinner.branch_code
- A categorical attribute representing the branch code inwhich the order was made. Branch information is given in thebranches.csv
file.order_items
- A list of tuples representing the order items: first element of the tuple is the item ordered, and the second element is the quantity ordered for such item.order_price
- A float value representing the order total price.customer_lat
- Latitude of the customer coming from thenodes.csv
file.customer_lon
- Longitude of the customer coming from thenodes.csv
file.customerHasloyalty?
- A logical variable denoting whether the customer has a loyalty card with the restaurant (1 if the customer has loyalty and 0 otherwise).distance_to_customer_KM
- A float representing the shortest distance, in kilometers, between the branch and the customer nodes with respect to the nodes.csv and theedges.csv
files.
Dijkstra algorithm can be used to find the shortest path between two nodes in a graph.
delivery_fee
- A float representing the delivery fee of the order.
-
Write a python script to transform the data into a usable format for a BI Analyst.
- The transformation can generate more than one table if needed.
-
Analyze the data.
- Highlight critical errors that need to be corrected before the BI analyst can make use of the data.
Based on this information above:
- The dataset has 500 rows and 12 columns.
- The dataset has 0 null values.
Data transformations and cleaning tasks that might be needed include:
- Checking the data type of all columns to make sure they are in the appropriate format for analysis.
- The
order_id
,date
,time
,order_type
,branch_code
,order_items
columns datatypes need to be changed. - The
date
&time
need to be merged into a single column, renamed toorder_date
and converted todatetime
dtype format for Parsing Dates. - Extracting the food items and their quantities from the
order_items
column into separate columns. - Checking the category column entries for case sensitivity (Checking if the values in the branch_code and order_type columns are consistent and correctly labeled.)
- feature engineer the
order_date
column, potentially separating them into year, month, day, and hour columns. - Removing any unnecessary or irrelevant columns.
- Checking for outliers in numerical columns such as order_price, distance_to_customer_KM, and delivery_fee.
- Checking if the customerHasloyalty? column only contains binary values (0 and 1).
- Checking if the
distance_to_customer_KM
column is consistent with the values in thenodes.csv
file. - Checking if the
delivery_fee
column is consistent with the values in theedges.csv
file. - Checking if the
order_price
column is consistent with the values in theorder_items
column. - Checking if the
order_price
column is consistent with the values in thedelivery_fee
column.
Remember, the goal of this process is to ensure that your dataset is clean, understandable, and ready for further Business Intelligence analysis.
The link to the Google Colab Notebook - Melbourne Delivery