Projects completed in the Udacity Data Streaming Nanodegree program.
Project 1: Optimizing Public Transportation
Constructed a streaming event pipeline around Apache Kafka and its ecosystem.
- Configured the producers such that events are sent to Kafka along with the Avro key and value schemas
- Ingested data from a PostgreSQL database using Kafka Connect
- Utilized Faust to transform ingested data
- Aggregated data using KSQL
- Configured the consumer to consume data from Kafka
Proficiencies used: Apache Kafka, Kafka Connect, Faust Stream processing, KSQL
Project 2: SF Crime Statistics with Spark Streaming
Created a Kafka server to produce data and ingested the data using Spark Structured Streaming.
- Built a simple Kafka server (with Zookeeper) for producing data
- Created a Spark consumer to perform data aggregation and joining
- Modified SparkSession property parameters to optimize processing throughput
Proficiencies used: Apache Kafka, Apache Spark, Spark Structured Streaming