This is like a continuous repo where I will add details related to different most used Data Engineering Tools. The repo consist of all the study topics I have completed and also different usecases I have faced/done in my tenure as big data engineering till now.
Below are the Data Engineering tools that would be covered as part of this repo:
Topics | Current Status |
---|---|
HDFS | To Do |
Hive | In Progress |
Impala | To Do |
Sqoop | To Do |
Oozie | To Do |
Core Python | In Progress |
Core Scala | To Do |
Spark-Scala | To Do |
Pyspark | In Progress |
Spark Streaming with Python | In Progress |
Spark Streaming with Scala | To Do |
MySQL | In Progress |
Airflow | To Do |
Snowflake | To Do |
MongoDB | To Do |
Cassandra | To Do |
Apache Beam | To Do |
Below are some of the topics of each tools would be covered in this repo:
- Documents related to overview on different Data Engineering tools.
- Coding examples.
- Frequently asked coding examples in interviews.
- Document related to different practical use cases on Data Engineering tools.
- Different Research Papers.
Aim for this repo is to make an all-in-one study guide to make you an AWESOME DATA ENGINEER and help you all to crack interview related to Data Engineering.