/udacity-data-streaming

An event streaming pipeline built using Kafka, KSQL, Faust, and Spark Structured Streaming

Primary LanguagePython

Udacity Data Streaming Nanodegree

Projects completed in the Udacity Data Streaming Nanodegree program.

Constructed a streaming event pipeline around Apache Kafka and its ecosystem.

  • Configured the producers such that events are sent to Kafka along with the Avro key and value schemas
  • Ingested data from a PostgreSQL database using Kafka Connect
  • Utilized Faust to transform ingested data
  • Aggregated data using KSQL
  • Configured the consumer to consume data from Kafka

Proficiencies used: Apache Kafka, Kafka Connect, Faust Stream processing, KSQL

Created a Kafka server to produce data and ingested the data using Spark Structured Streaming.

  • Built a simple Kafka server (with Zookeeper) for producing data
  • Created a Spark consumer to perform data aggregation and joining
  • Modified SparkSession property parameters to optimize processing throughput

Proficiencies used: Apache Kafka, Apache Spark, Spark Structured Streaming