Disposable local tests with Apache Pinot.
The broker and controller services expose REST APIs. In addition, a CLI is available to perform any operation on the cluster, such as defining schema or adding tables, i.e.:
pinot-admin.sh AddTable -tableConfigFile /path/to/table.json -schemaFile /path/to/schema.json -controllerHost localhost -controllerPort 9000 -exec
for instance:
docker run \
--network=pinot-demo \
--name pinot-streaming-table-creation \
${PINOT_IMAGE} AddTable \
-schemaFile examples/stream/githubEvents/pullRequestMergedEvents_schema.json \
-tableConfigFile examples/stream/githubEvents/docker/pullRequestMergedEvents_realtime_table_config.json \
-controllerHost pinot-controller \
-controllerPort 9000 \
-exec
Pinot supports JSON, Avro, Thrift, Parquet, ORC and Protobuf out of the box. For Avro, the schema registry is supported.
For instance, avro stream ingestion can be configured with:
"streamType": "kafka",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaAvroMessageDecoder",
"stream.kafka.decoder.prop.schema.registry.rest.url": "http://localhost:2222/schemaRegistry",
In the docker quickstart Pinot is set up as docker compose.
docker-compose --project-name pinot-demo up
By default, Apache Pinot will run with ephimeral storage, so in order to persist ingested data, a connection to a deep storage has to be set, for instance Amazon S3, also documented as tutorial here.
In the example, we use minio. You can login at localhost:9101
using the default minioadmin
user and minioadmin
password.
Under identity > users
, you can create a new user with miniodeepstorage
and miniodeepstorage
credentials and finally a bucket named pinot-events
. Do not forget to assgn it readwrite
access to the user for the bucket. See example controller-conf.conf
file in case you want to change those default settings.
Please head here to ingest a batch of data into Pinot, and here to connect a Kafka topic.
For instance, in the data folder you can find a schema and a table configuration, as taken from the pinot-minio recipe:
Create an events
topic and add it as realtime table as follows:
docker exec -it pinot-controller bin/pinot-admin.sh AddTable \
-tableConfigFile /config/table-realtime.json \
-schemaFile /config/schema.json -exec
Finally, use the generate_data.sh
script to generate random samples. You can also inspect the topic using the kafka utils contained in the kafka container, for instance:
kafka-console-consumer --topic events --bootstrap-server localhost:9092 --from-beginning
Given the name of the services as defined in the docker compose file, you can connect via a sql alchemy string having format:
pinot+http://pinot-broker:8099/query?controller=http://pinot-controller:9000/
Please also refer to the official documentation here.
Make sure you have kind (https://kind.sigs.k8s.io/) installed and select the latest version here.
export KIND_IMAGE=kindest/node:v1.24.0@sha256:0866296e693efe1fed79d5e6c7af8df71fc73ae45e3679af05342239cdc5bc8e
cat kind-cluster.yaml | envsubst | kind create cluster --config=-
kubectl create namespace kafka
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka
kubectl apply -f k8s/kafka-cluster.yaml -n kafka
kubectl create ns pinot
helm repo add pinot https://raw.githubusercontent.com/apache/pinot/master/kubernetes/helm
helm install pinot pinot/pinot -n pinot --set cluster.name=pinot --namespace pinot
kubectl create ns superset
helm repo add superset https://apache.github.io/superset
helm upgrade --install superset superset/superset --namespace superset
kind delete cluster --name=pinot-cluster