LOW-LATENCY SALES BI DATA PIPELINE

Overview

The data pipeline is designed to provide near real-time analysis of the sales for a fictional company. The project has following components:

Producer
- kafkaProducer.py file generates dummy sales data and streams it with topic 'sales'. The data contains information about three sales attributes: State, Category, Platform
Consumer
- kafkaConsumer.py file consumes the stream by producer with topic 'sales' and writes the records to the collection 'salesRecords' in mongoDB.
Kafka Connect
- The connector is used to perform kafka integration with the Rockset Cluster.
Tableau
- With custom SQL the data from Rockset Cluster is queried and visualized. The dashboard Sales_Dashboard contains the code for custom SQL and visualizations.

The dashboard is published on Tableau Public and can be accessed by following this link