Getting Started with Spark Streaming

This repository supports my talk entitled Getting Started with Spark Streaming.

Demo One: Hello World

This demo is easiest to run in IntelliJ IDEA, although you can certainly run it via submit-spark or in another IDE like Eclipse.

Apache Spark must be installed locally. Instructions for this are available in my talk entitled Getting Started with Apache Spark.
ncat must be installed. This comes pre-installed on MacOS and Linux. For Windows, you can find it on the Nmap website.

This demo is easiest to run in IntelliJ IDEA, although you can certainly run it via submit-spark or in another IDE like Eclipse.

Apache Spark must be installed locally. Instructions for this are available in my talk entitled Getting Started with Apache Spark.
ncat must be installed. This comes pre-installed on MacOS and Linux. For Windows, you can find it on the Nmap website.
Docker must be installed and should be configured to run Linux-based containers rather than Windows-based containers.
Apache Kafka must be installed. My preferred option is to use Confluent Platform on Docker, as this works well on Windows.
Cassandra must be installed. My preferred option is to use Cassandra in a Docker container.

Start up Confluent Platform:

git clone https://github.com/confluentinc/cp-all-in-one
cd cp-all-in-one
git checkout 5.5.1-post
cd cp-all-in-one/
docker-compose up -d

Start up Cassandra:

docker run -p 9042:9042 -p 7000:7000 -p 7001:7001 -p 7199:7199 -p 9160:9160 --name spark-cassandra -d cassandra

Load data from data\cars.json into the car topic. Here is an example on Windows.

kafka-console-producer --broker-list localhost:9092 --topic car < C:\SourceCode\Getting-Started-With-Spark-Streaming\data\cars.json

Run SELECT * FROM public.car against Cassandra and notice the data has loaded into the Cassandra table.

These examples are derived from the Microsoft.Spark samples for F#.

Docker must be installed and should be configured to run Linux-based containers rather than Windows-based containers.
Apache Kafka must be installed locally if you wish to run the Kafka experiment. My preferred option is to use Confluent Platform on Docker, as this works well on Windows.

Start up Confluent Platform:

git clone https://github.com/confluentinc/cp-all-in-one
cd cp-all-in-one
git checkout 5.5.1-post
cd cp-all-in-one/
docker-compose up -d

In the Confluent Control Center (by default, http://localhost:9021), navigate to Cluster settings, choose Listener, and ensure that the advertised.listeners property has the IP address for your host machine. For example, if your host machine is at IP address 192.168.1.10, the advertised listener should be at that IP address rather than 127.0.0.1 or localhost. Otherwise, the .NET Kafka example will not work.
Create a topic in Kafka named Flights if you wish to run the Kafka demo.
Build the Dockerfile in this repository:
```
docker build . -t gswss
```
Choose the demo you want to run. Both vi and nano are installed with the image, so pick a text editor and modify the run_spark_dotnet_demo file to pick a specific demo.
```
docker run --name gswss -it gswss bash
cd /root
vi run_spark_dotnet_demo
```
Run ncat. To do this, open a new console and run the following:
```
docker exec -it gswss /bin/bash
nc -kl 9999
```
Alternatively, you can load a large file with the following:
```
docker exec -it gswss /bin/bash
nc -kl 9999 < /root/data/WarAndPeace.txt
```
Execute the run_spark_dotnet_demo script to run the chosen demo.
```
./run_spark_dotnet_demo
```