Technical Assessment Solution

My solution submission for the technical assessment task.

Project Pipeline

How to run

I desgined the code to be dockerized on container to run all scripts on one container containing the data and the flow of the execution.

1. Running docker-compose

You need to run docker-compose.yml file first to build the environment that holds the data, it contains:

Jupyter Notebook (port: 8888)
Postgres Database (port: 5432)
- User: root
- Password: root
- DB: RetailDB
Pg-Admin4 (port: 8080)
- Email: admin@admin.com
- Password: root

You can build the containers by typing the following on main project directory where doscker-compose.yml is located

docker-compose up -d

Note: Docker should be installed.

2. Configure pg-admin4 to connect to RetailDB

3. Running scripts container

To be able to run the scripts, you need to build and and run the Dockerfile, you can execute the following commands on the directory that contains Dockerfile

Build the image: artefact-project:v01
```
docker build -t artefact-project:v01 .
```

Build and run the container

docker run -it --network=global-network --name artefact_project_container artefact-project:v01

4. Schedule Run

To run the scripts periodically, we need to establish schedule run to run the code, on our case I've built cron job to run the code every day at 10:00 AM. On your CMD run the following commands

Open crontab
```
crontab -e
```

Put the schdule on bottom of oppened file, close the file and save

0 10 * * *  docker build -t artefact-project:v01 . && docker run -it --network=global-network --name artefact_project_container artefact-project:v01

You can watch your schdules log by typing the following command

greb CRON /var/log/sylog

Data Warehouse Schema

Partitioning and Indexing

You will find the partitioning and indexing strategy in this section

Quality and Version Management

You will find the quality and version management strategies in this section

Project Investigation and dig-deep

You can find my sandbox Jupyter Notebooks on jupyter-data directory that contains investigations and what I was thinking and executing before building final scripts

Project Structure

├── Dockerfile ├── LICENSE
├── README.md
├── build_populate_dwh.py
├── clean_data.py \
├── crontab.sh
├── docker-compose.yml
├── dwh-design
├── etl-data
├── etl_utils.py
├── extract_transform_load_data.py
├── images
├── ingest_base_data.py
├── jupyter-data
│ ├── data_cleaning_validation.ipynb
│ ├── data_ingestion.ipynb
│ ├── data_warehouse_build.ipynb
│ ├── etl_process.ipynb
│ └── online_retail.csv
├── partitioning-indexing
│ ├── DimCustomer
│ │ └── DimCustomer.sql
│ ├── DimDate
│ │ └── DimDate.sql
│ ├── DimProduct
│ │ └── DimProduct.sql
│ ├── FactRetailSales
│ │ └── FactRetailSales.sql
│ ├── online_retail_sales
│ | └── online_retail_sales.sql
│ └── README.md
├── quality-versioning-management
│ ├── ALTER_online_retail_sales.sql
│ ├── LOGGING_online_retail_sales.sql
│ └── README.md
├── run_project.sh
└── utils.py \

11 directories, 59 files

yossef-elmahdy/technical-assessment