The data of this project is available in the Excel file sales.xlsx
and the descriptions of the columns of the dataset are given in the table below.
Column | Description |
---|---|
InvoiceNumber |
A 6-digit number uniquely assigned to each invoice. If the beginning of this number starts with the letter C, it means that the invoice has been canceled. |
ProductCode |
A 5-digit number that is uniquely assigned to each type of product. |
ProductName |
Product's name |
Quantity |
The number of orders of a product type in the invoice |
InvoiceDate |
Invoice creation date |
UnitPrice |
The price of a product type per unit |
CusotmerId |
A 5-digit number that is uniquely assigned to each customer. |
Country |
The name of the customer's country of residence |
This project has 5 steps:
Data preprocessing
: A series of preprocessing steps are performed on the entire data, such as handling the missing dataExploration
: I answer a series of high-level questions and obtain an intuitive view of the company's financial information.Study of target markets
: I will analyze different locations of sale and supply and I will check which countries, despite having many customers, experience little sales.Customer value
: Using the RFM practical criteria, I divide the company's customers into 7 categories, each of which has its meaning and behavior in terms of marketing.Customer retention rate analysis
: What percentage of customers buy from this company in the following months after their first purchase.?
I only used Numpy, Pandas, Matplotlib, and Seaborn in this project