Data Engineering Roadmap
- Learn SQL... Aggregations with GROUP BY Joins (INNER, LEFT, FULL OUTER) Window functions Common table expressions etc.
You can learn from https://www.w3schools.com/
- Learn python/Scala..... Learn basics for/while/if loops, functional programming, abstract methods, traits Learn libraries like numpy, pandas, scikit-learn etc.
you can learn https://lnkd.in/gSz45km5
-
Learn distributed computing... Hadoop versions/hadoop architecture fault tolerance in hadoop Read/understand about Mapreduce processing. learn optimizations used in mapreduce etc.
-
Learn data ingestion tools... Learn Sqoop/ Kafka/NIFi Understand their functionality and job running mechanism.
-
Learn data processing/NOSQL.... Spark architecture/ RDD/Dataframes/datasets. lazy evaluation, DAGs/ Lineage graph/optimization techniques YARN utilization/ spark streaming etc.
-
Learn data warehousing..... Understand how HIve store and process the data different File formats/ compression Techniques. partitioning/ Bucketing. different UDF's available in Hive. SCD concepts. Ex Hbase. cassandra
-
Learn job Orchestration... Learn Airflow/Oozie learn about workflow/ CRON etc.
-
Learn Cloud Computing.... Learn Azure/AWS/ GCP. understand the significance of Cloud in #dataengineering Learn Azure synapse/Redshift/Big query Learn Ingestion tools/pipeline tools like ADF etc.
-
Learn basics of CI/ CD and Linux commands.... Read about Kubernetes/Docker. And how crucial they are in data.