/All-About-Data-Engineering

This Repo contains all the tools needed for Data Engineering

Primary LanguageJupyter Notebook

All About Data Engineering

This is like a continuous repo where I will add details related to different most used Data Engineering Tools. The repo consist of all the study topics I have completed and also different usecases I have faced/done in my tenure as big data engineering till now.

Below are the Data Engineering tools that would be covered as part of this repo:

Topics Current Status
HDFS To Do
Hive In Progress
Impala To Do
Sqoop To Do
Oozie To Do
Core Python In Progress
Core Scala To Do
Spark-Scala To Do
Pyspark In Progress
Spark Streaming with Python In Progress
Spark Streaming with Scala To Do
MySQL In Progress
Airflow To Do
Snowflake To Do
MongoDB To Do
Cassandra To Do
Apache Beam To Do

Below are some of the topics of each tools would be covered in this repo:

  • Documents related to overview on different Data Engineering tools.
  • Coding examples.
  • Frequently asked coding examples in interviews.
  • Document related to different practical use cases on Data Engineering tools.
  • Different Research Papers.

Aim for this repo is to make an all-in-one study guide to make you an AWESOME DATA ENGINEER and help you all to crack interview related to Data Engineering.