The first step in the Data Science Life Cycle is to understand the business we are about to interperet the Data. The data itself does explain quite a bit about the business model:
- This is a small business with only nine employees.
- Employees are responsible for selling to specific regions which subset into specific territories.
- Order quantities are large and since suppliers are involved, either a specialty grocery and/or restaurant supplier.
Northwind database--a free, open-source dataset created by Microsoft containing data from a fictional company.
The data is provided via a SQLite database. After glancing through the tables in https://sqliteonline.com/, there are a few notes for reference:
- The tables CustomerCustomerDemo and CustomerDemographics have no information in htem.
- While the ERD tables have all ID columns listed, they are not specifically labeled those items in the tables. Only "Id" is provided. This will mean renaming may be necessary for the purpose of joining tables(dataframes).
- Since basic SQL queries will not be efficient for the purposes of the project, we will convert the database into a pandas dataframe using sqlalchemy and pandas.