Repository for projects developed in Udacity's Data Engineering Nanodegree.
Short description: Relational database modelling using PostgreSQL to model user activity data for a music streaming app.
Tools and technologies: Python, PostgreSql, Star Schema, ETL pipelines, Normalization.
Short description: NoSQL database design using Apache Cassandra.
Tool and technologies: Python, Apache Cassandra, Denormalization.
Short description: Database warehouse design on Amazon Redshift.
Tools and technologies: Python, Amazon Redshift, aws cli, Amazon SDK, SQL, PostgreSQL.
Short description: Scaled up ETL pipelines by moving the data warehouse to a data lake.
Tools and technologies: Spark, S3, EMR, Athena, Amazon Glue, Parquet.
Short description: Automation of ETL pipeline and creation of data warehouse using Apache Airflow.
Tool and technologies: Apache Airflow, S3, Amazon Redshift, Python.
I use this for my own projects, I know this might not be the perfect approach for all the projects out there. If you have any ideas, just [open an issue][issues] and tell me what you think.
If you'd like to contribute, please fork the repository and make changes as you'd like. Pull requests are warmly welcome.
Distributed under the GPL License. See LICENSE
for more information.