This is the code repository for Practical Real-time Data Processing and Analytics, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.
With the rise of Big Data, there is an increasing need to process large amounts of data continuously, with a shorter turnaround time. Real-time data processing involves continuous input, processing and output of data, with the condition that the time required for processing is as short as possible.
This book covers the majority of the existing and evolving open source technology stack for real-time processing and analytics. You will get to know about all the real-time solution aspects, from the source to the presentation to persistence. Through this practical book, you’ll be equipped with a clear understanding of how to solve challenges on your own.
We’ll cover topics such as how to set up components, basic executions, integrations, advanced use cases, alerts, and monitoring. You’ll be exposed to the popular tools used in real-time processing today such as Apache Spark, Apache Flink, and Storm. Finally, you will put your knowledge to practical use by implementing all of the techniques in the form of a practical, real-world use case.
All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.
The code will look like the following:
cp kafka_2.11-0.10.1.1.tgz /home/ubuntu/demo/kafka
cd /home/ubuntu/demo/kafka
tar -xvf kafka_2.11-0.10.1.1.tgz
The book is intended to graduate our readers into real-time streaming technologies. We expect the readers to have fundamental knowledge of Java and Scala. In terms of setup, we expect readers to have basic maven, Java, and Eclipse set up to run the examples.