Author: Gabriel Pedrosa
Date: April 15, 2023
IBM Data Engineering Professional Certificate
Final project to achieve the IBM Data Engineering Professional Certificate
The objective of this project is to develop a data platform for an e-commerce company called SoftCart, in which an end-to-end data pipeline will be developed, including the design and implementation of data architectures (RDBMS, NoSQL and Data Warehouse) , ETL processes, BI dashboards and Machine Learning models. All projects are independent of each other.
SoftCart's online presence is primarily through its website, which customers access using a variety of devices like laptops, mobiles and tablets.
All the catalog data of the products is stored in the MongoDB NoSQL server.
All the transactional data like inventory and sales are stored in the MySQL database server.
SoftCart's webserver is driven entirely by these two databases.
Data is periodically extracted from these two databases and put into the staging data warehouse running on PostgreSQL.
Production data warehouse is on the cloud instance of IBM DB2 server.
BI teams connect to the IBM DB2 for operational dashboard creation. IBM Cognos Analytics is used to create dashboards.
SoftCart uses Hadoop cluster as it big data platform where all the data collected for analytics purposes.
Spark is used to analyse the data on the Hadoop cluster.
To move data between OLTP, NoSQL and the dataware house ETL pipelines are used and these run on Apache Airflow.
-
OLTP database - MySQL
-
NoSql database - MongoDB
-
Production Data warehouse – DB2 on Cloud
-
Staging - Data warehouse – PostgreSQL
-
Business Intelligence Dashboard - IBM Cognos Analytics
-
Data Pipelines - Apache Airflow
-
Machine Learning - Apache Spark
Directory | Description |
---|---|
sql/ |
SQL files |
steps/ |
Development steps documentation |
imgs/ |
Images repository |
src/ |
Code repository |