/dataeng

Repository fo Data Engineering Course

dataeng

Repository fo Data Engineering Course

Syllabus

Introduction

Lecture

  • What is (Big) Data?
  • The Role of Data Engineer
  • From Data Wharehouse to Data Lakes

Practice

  • Setup Docker
  • Introduction to Jupyter Notebooks

Part 1: Data Modelling and Query Languages

Lecture

  • Relational Data
  • NoSQL
    • Document
    • Graph
  • Data Wharehousing
    • Star and Snowflake schemas
  • Data Vault

Practice

  • Modelling and Querying Relational data: MySQL
  • Modelling and Querying Document data: MongoDB
  • Modelling and Querying Graph data: Cypher

Extras

  • Modelling and Querying RDF data: SPARQL
  • Domain Driven Design: a summary
  • Event Sourcing: a Summary

Part 2: Data Transformation & Data Systems

Lecture

  • Big Data Systems Architectures
  • ETL and Data Pipelines
    • Best Practices and Anti-Patterns
  • Batch vs Streaming Processing
  • Data Replication
  • Data Partitioning
  • Transactions

Practice

  • Data Ingestion with Apache Kafka
  • Data Pipelines with Apache Airflow
  • Data Processing with Kafka Streams/KSQL

Extras

  • Data Pipelines with Luigi
  • Data Pipelines with Apachi Nifi
  • Data Processing with Apache Flink

Part 3: Data Wrangling

Lecture

  • Cleansing
  • Augumentation

Practice

  • Cleansing examples using OpenRefine
  • Augumentation examples using Pandas and Tensorflow

Contributing

Lecturers