/DataEngineeringProjects

Data Engineering Projects

Primary LanguageJupyter Notebook

DataEngineeringProjects

DataEngineeringProjects Repository consists of four projects that are related to Data Engineering projects.


CSV files were read into DataFrames by using Pandas. Then, DataFrames were loaded to SQL Server by using Python.
In this project these technologies were used:

  • Python (Pandas, pyodbc)
  • SQL Server


It is a IoT Smoke Detection project where data was emitted at high volumne in a continuous, incremental manner with the goal of low-latency processing. Apache Kafka was used to process streaming data in real-time, then data was transformed by Apache Spark and finally loaded to SQL Server.
In this project these technologies were used:

  • Python (Pandas)
  • Apache Kafka
  • Apache Spark
  • SQL Server


Data was extracted from websites that holds Currency Exchange Rates for Currencies.
Data was continously generated. Apache Kafka was used to process streaming data in real-time. These tasks were triggered by Apache Airflow
Data from Apache Kafka was read as well as transformed by Apache Spark.
Finally, data was loaded to PostgresSQL.
Docker was used to run this application in multicontainers.
In this project these technologies were used:

  • Python
  • Apache Airflow
  • Apache Kafka
  • Apache Spark
  • PostgreSQL
  • Docker


US Dollar Exchange Rates Table as well as Percentage Change in the Last 24 Hours Tables were extracted from a website. Data was extracted and loaded to a MinIO bucket using Python.
This data was also continously generated. Apache Kafka was used to process streaming data in real-time. These tasks were triggered by Apache Airflow
Data from Apache Kafka was read as well as transformed by Apache Spark.
Finally, data was loaded to Apache Cassandra.
Docker was used to run this application in multicontainers.
In this project these technologies were used:

  • Python
  • Apache Airflow
  • MinIO
  • Apache Kafka
  • Apache Spark
  • Apache Cassandra
  • Docker