/streamsql

Apache Spark Consuming Kafka Processing PostgreSql To Redshift

Primary LanguageScala

Requirements

  1. Maven
  2. Apache Spark
  3. Scala

Clone the repo

Use the following commands:

  1. sudo yum install git

  2. git clone https://github.com/mdmamunhasan/streamsql.git

  3. cd streamsql

Install the code

Use the following command: mvn clean install

Reference

This post demonstrates how to set up Apache Kafka on Amazon EC2, use Spark Streaming on Amazon EMR to process data coming in to Apache Kafka topics, and query streaming data using Spark SQL on Amazon EMR.

This repo provides:

  • An AWS CloudFormation stack to set up Apache Kafka on Amazon EC2
  • Scripts/code to create the Apache Kafka topic and producer
  • Spark Streaming and Spark SQL code to run on Amazon EMR

For more information about how to set everything up, see the post.

https://github.com/awslabs/aws-big-data-blog.git