All About Data Engineering

This is like a continuous repo where I will add details related to different most used Data Engineering Tools. The repo consist of all the study topics I have completed and also different usecases I have faced/done in my tenure as big data engineering till now.

Below are the Data Engineering tools that would be covered as part of this repo:

Topics	Current Status
HDFS	To Do
Hive	In Progress
Impala	To Do
Sqoop	To Do
Oozie	To Do
Core Python	In Progress
Core Scala	To Do
Spark-Scala	To Do
Pyspark	In Progress
Spark Streaming with Python	In Progress
Spark Streaming with Scala	To Do
MySQL	In Progress
Airflow	To Do
Snowflake	To Do
MongoDB	To Do
Cassandra	To Do
Apache Beam	To Do

Below are some of the topics of each tools would be covered in this repo:

Documents related to overview on different Data Engineering tools.
Coding examples.
Frequently asked coding examples in interviews.
Document related to different practical use cases on Data Engineering tools.
Different Research Papers.

Aim for this repo is to make an all-in-one study guide to make you an AWESOME DATA ENGINEER and help you all to crack interview related to Data Engineering.

dexi154/All-About-Data-Engineering

All About Data Engineering