/spark-http-streaming

Running Apache Spark Structured Streaming job on the local machine with an HTTP web server as a streaming source.

Primary LanguageScalaMIT LicenseMIT

Spark HTTP Streaming

This project demonstrates how you can use a local HTTP server as a streaming source to debug a Structured Streaming job on local machine. The idea is to have spark app start a local HTTP server and put the ingested data on MemoryStream and use it as a streaming source.

Note that this is for testing and running locally only. Since it uses Memory Stream underneath, it is not fault-tolerant. Refer to the fault-tolerance semantics in structured streaming.

For more details please refer to the blog post:
Spark Streaming with HTTP REST endpoint serving JSON data

How to use

  1. Run the HttpStreamApp spark application
  2. POST sample JSON data to http://localhost:9999

Demo

Watch: https://www.youtube.com/watch?v=Y9g4oj5GH5k
You will see that the spark app ingest that data in micro-batches of Structured Streaming and displays it.