Data Engineering

This repository includes all the work and assignments I did from the Datacamp.

Introduction

In this track, I have discovered how to build an effective data architecture, streamline data processing, and maintain large-scale data systems. In addition to working with Python, I have also grown language skills as I work with Shell, SQL, and Scala, to create data engineering pipelines, automate common file system tasks, and build a high-performance database. Through hands-on exercises, I have added cloud and big data tools such as AWS Boto, PySpark, Spark SQL, and MongoDB, to data engineering toolkit to help create and query databases, wrangle data, and configure schedules to run pipelines.

1. Data Engineering for Everyone

2. Introduction to Data Engineering

3. Streamlined Data Ingestion with pandas

4. Writing Efficient Python Code

5. Writing Functions in Python
6. Introduction to Shell
7. Data Processing in Shell
8. Introduction to Bash Scripting
9. Unit Testing for Data Science in Python
10. Object-Oriented Programming in Python
11. Introduction to Airflow in Python
12. Introduction to PySpark
13. Building Data Engineering Pipelines in Python
14. Introduction to AWS Boto in Python
15. Introduction to Relational Databases in SQL
16. Database Design
17. Introduction to Scala
18. Big Data Fundamentals with PySpark
19. Cleaning Data with PySpark
20. Introduction to Spark SQL in Python
21. Cleaning Data in SQL Server Databases
22. Transactions and Error Handling in SQL Server
23. Building and Optimizing Triggers in SQL Server
24. Improving Query Performance in SQL Server
25. Introduction to MongoDB in Python

Reference

https://learn.datacamp.com/career-tracks/data-engineer-with-python?version=3

minji-mia/data-engineering

Data Engineering

Introduction

Reference