/data_engineering_nanodegree

Data Engineering Nanodegree projects

Primary LanguageJupyter Notebook

Data Engineering Nanodegree (Udacity)

img

Hi everyone, Welcome to my data engineering projects repository. The repository aims to show my data engineering projects that I have done in the data engineering nanodegree program from Udacity. Please feel free to use it as a reference for your project.

Projects

There will be five projects that used the same data source (Sparkify). We tackle the same data source with a different technology such as PostgreSQL, Cassandra, AWS Redshift and Spark. With the same data source, we will learn how to apply each data technology to extract, transform and load data. What is the benefit and drawback of each stack?

When we finished learning all the technology, we have a chance to select and apply the knowledge by ourselves in capstone_project. In short, we build an end-to-end data pipeline for I94 Immigration data.

The project is open-ended. There will have many features I can improve in the capstone_project later after graduating from the nanodegree. Here are something I plan to improve the project in the future.

  • Orchestrate the data pipeline with Apache Airflow
  • Deploy the project in AWS environment with CloudFormation
  • Try to create an end-to-end data pipeline with other data sources (API, JSON)

What about the Nanodegree?

Disclaimer: I take this course with a 75 % off monthly discount (~ 100 USD). Also, I'm familiar with python, sql, and spark language before taking this course.

Time to complete

Here is my brief opinion on the data engineering nanodegree. I've been taking this course from 17 Mar 2021 to 14 Apr 2021. What I mean is that you can finish the whole data engineering program within less than a month. The suggested time to complete this course from Udacity is around five months.

Materials

You can fast forward through the material stuff because It can access after graduation to review and refresh your knowledge.

Project quality

The time I spent with the project assignment was valuable. My background in data engineering was near to nothing. I occasionally create or update the table in the database before I am taking this course. It's great to develop and update the data table from the beginning. I think that would be a boring part for other students. I saw some comments in the student forum that it's like a typing practice which I agreed on for some assignments.

Development environment

Udacity workspace

The good thing about this Nanodegree is that they provide you with a complete development environment. I can submit the spark script through the terminal without any configuration. This environment is the best thing in this nanodegree program. If you don't understand why it is good, try replicating the Udacity workspace in your local machine. This benefit helps students focus on learning what is important rather than spending time configure the environment.

AWS environment

Besides, the program provides you access to the AWS environment. You can play and experiment with the realistic cloud environment within a sufficient budget provided by Udacity. That is another great experience here because, in my opinion, Cloud architecture will become the mainstream of data pipeline development in the future. So, the chance to learn this skill is valuable.

Interaction with the mentor

The project review is another feature I love about this nanodegree. In other MOOC courses, The system usually grades your assignment. But in the nanodegree you have to submit your project, and the mentor will review it. I would say that some mentors give me informative comments with sufficient materials for me to make a further study by myself. This feature helps me learning new things faster.

Summary

In summary, I recommend anyone who lacks confidence in starting their data engineering journey like me to take this course. It helps boost your confidence and gives you many ideas to do in a further project. Also, You can add your assignments to your portfolio or GitHub to show that you know something about this topic.

It worth the price in my case (75 % discount), and I have finished it within a month (~ $100). If I have to pay for a total price (~ $1695) with five months of access, I think it is too expensive for this quality. Udacity often provides the discount code throughout the year, and maybe you can wait for the right time. Best of luck, everyone!