- Abstract
- Learning Objectives
- Prerequisites
- Training Outline and Timeline
- Welcome Breakfast
- Module 1: Introduction to Real-Time Analytics and Overview of Technologies
- Module 2: Setting Up Local Clusters with Docker
- Break
- Module 3: Ingesting Data into Apache Pinot
- Lunch Break 12:00 PM - 1:00 PM
- Module 4: Integrating Kafka with Pinot for Real-Time Data Ingestion
- Break 2:00 PM - 2:30 PM
- Module 5: Stream Processing with Apache Flink
- Wrap-Up and Q&A
- Equipment and Software Check
- Let’s Get Going!
- Practice Part Overview
Apache Pinot is a high-performance database engineered to serve analytical queries with extremely high concurrency, boasting latencies as low as tens of milliseconds. It excels at ingesting streaming data from sources like Apache Kafka and is optimized for real-time, user-facing analytics applications.
In this full-day training, we will explore the architectures of Apache Kafka, Apache Flink, and Apache Pinot. We will run local clusters of each system, studying the role each plays in a real-time analytics pipeline. Participants will begin by ingesting static data into Pinot and querying it. Not content to stop there, we’ll add a streaming data source in Kafka, and ingest that into Pinot as well, showing how both data sources can work together to enrich an application. We’ll then examine which analytics operations belong in the analytics data store (Pinot) and which ones should be computed before ingestion. These operations will be implemented in Flink. Having put all three technologies to use on your own in hands-on exercises, you’ll leave prepared to begin exploring them together for your own real-time, user-facing analytics applications.
At the successful completion of this training, you will be able to:
-
List the essential components of Pinot, Kafka, and Flink.
-
Explain the architecture of Apache Pinot and its integration with Apache Kafka and Apache Flink.
-
Form an opinion about the proper role of Kafka, Flink, and Pinot in a real-time analytics stack.
-
Implement basic stream processing tasks with Apache Flink.
-
Create a table in Pinot, including schema definition and table configuration.
-
Ingest batch data into an offline table and streaming data from a Kafka topic into a real-time table.
-
Use the Pinot UI to monitor and observe your Pinot cluster.
-
Use the Pinot Query Console
To participate in this workshop, you will need the following:
-
Docker Desktop: We will use Docker to run Pinot, Kafka, and Flink locally. If you need to install it, please download Docker Desktop and follow the instructions to install it at https://www.docker.com/get-started/.
-
Resources: Pinot works well in Docker but is not designed as a desktop solution. Running it locally requires a minimum of 8GB of Memory and 10GB of disk space.
Duration 7:30 AM - 9:00 AM
-
Participants arrive and enjoy breakfast
-
Time for networking and setting up personal workstations, PC laptop uses are finding power sources
Duration: 1 hr (9:00 AM - 10:00 AM) Speaker: Viktor
-
Discuss the concept of real-time analytics
-
Overview of Apache Kafka, Apache Flink, and Apache Pinot
-
How these technologies work together in a real-time analytics stack
Duration: 30 min (10:00 AM - 10:30 AM) Speakers: Viktor (Lead), Upkar (TA)
-
Guide on installing Docker (if not pre-installed)
-
Setup of Apache Kafka, Apache Flink, and Apache Pinot clusters
-
checking internet connection
-
pulling images
-
smoke test
-
-
Ensuring everyone’s local environment is configured correctly
Duration 1hr (11:00 AM - 12:00 PM) Speaker: Viktor
-
Creating a schema and configuring a table in Pinot
-
Ingesting static data into an offline table
-
Querying data in Apache Pinot
Duration: 1 hr (1:00 PM - 2:00 PM) Speaker: Viktor (Lead), Upkar (TA)
-
Kafka 101 refresher
-
Setting up a Kafka topic
-
Streaming data ingestion from Kafka to a real-time table in Pinot
-
Using the Pinot UI to monitor and manage the cluster
Duration: 1hr (2:30 PM - 3:30 PM) Speakers: Upkar (Lead), Viktor (TA)
-
Basic concepts of stream processing
-
Implementing stream processing tasks with Apache Flink
-
Enriching Kafka streams before ingestion into Pinot
Duration: 30 min (3:30 PM - 4:00 PM) Speakers: Viktor, Upkar
-
Recap of the day’s lessons
-
Open floor for questions
-
Discussion on potential use cases in participants' work
To ensure you are fully prepared for the workshop, please follow these guidelines:
-
Version Control:
-
Check out the latest version of the workshop repository to access all necessary materials and scripts.
git clone https://github.com/gAmUssA/uncorking-analytics-with-pinot-kafka-flink.git cd uncorking-analytics-with-pinot-kafka-flink
-
-
Docker:
-
Install Docker if it isn’t already installed on your system. Download it from https://www.docker.com/products/docker-desktop.
-
Before the workshop begins, pull the necessary Docker images to ensure you have the latest versions:
make pull_images
-
-
Integrated Development Environment (IDE):
-
Install Visual Studio Code (VSCode) to edit and view the workshop materials comfortably. Download VSCode from https://code.visualstudio.com/.
-
Add the AsciiDoc extension from the Visual Studio Code marketplace to enhance your experience with AsciiDoc formatted documents.
-
-
Validate Setup:
-
Before diving into the workshop exercises, verify that all Docker containers needed for the workshop are running correctly:
docker ps
-
This command helps confirm that there are no unforeseen issues with the Docker containers, ensuring a smooth operation during the workshop.
-
-
Using VSCode:
-
Open the workshop directory in VSCode to access and edit files easily. Use the AsciiDoc extension to view the formatted documents and instructions:
code .
-
-
Docker Issues:
-
If Docker containers fail to start or crash, use the following command to inspect the logs and identify potential issues:
docker logs <container_name>
-
This can help in diagnosing problems with specific services.
-
-
Network Issues:
-
Ensure no applications are blocking the required ports. If ports are in use or blocked, reconfigure the services to use alternative ports or stop the conflicting applications.
-
-
Removing Docker Containers:
-
To clean up after the workshop, you might want to remove the Docker containers used during the session to free up resources:
make stop_containers
-
Additionally, prune unused Docker images and volumes to recover disk space:
docker system prune -a docker volume prune
-
These steps and tips are designed to prepare you thoroughly for the workshop and to help address common issues that might arise, ensuring a focused and productive learning environment.
The practical exercises of this workshop are divided into three distinct parts, each designed to give you hands-on experience with Apache Pinot’s capabilities in different scenarios. Below are the details and objectives for each part: