/trino-superset-demo

Demo application to showcase integration of Trino with Apache superset using Minio and Hive metastore

Primary LanguageDockerfile

Trino Superset Demo

This repository contains a demo integration of trino with Apache Superset. Many times, I like to visualize contents in formats like parquet, csv, json etc., But getting them in Apache Superset was little hard for me. With this repository, we can visualize columnar file formats.

The complete stack is runable in local. Useful for doing local data analytics experimentation.

Tech stack

  • Trino (formerly PrestoSQL)
  • Minio - for hosting the file. AWS S3 compatible.
  • Hive Metastore - for accessing files from Trino using Hive connector
  • Apache superset - for visualizing

The following file types are supported for the Hive connector:

  • ORC
  • Parquet
  • Avro
  • RCText (RCFile using ColumnarSerDe)
  • RCBinary (RCFile using LazyBinaryColumnarSerDe)
  • SequenceFile
  • JSON (using org.apache.hive.hcatalog.data.JsonSerDe)
  • CSV (using org.apache.hadoop.hive.serde2.OpenCSVSerde)
  • TextFile

Getting started

  • docker-compose up
  • Docker volumes are locally mounted. This should help in understanding the data of different service.

Setup superset

  • First time : sh superset_init.sh
  • In Superset, add trino with SqlAlchemy URI - trino://hive@trino-coordinator:8080/hive

Service links

  • Superset (username: admin, password: admin)
  • Minio - username: minio_access_key, password: minio_secret_key)
  • Trino

Trino CLI

docker exec -it trino trino

Run SQL commands listed trino/init.sql

Useful commands

# restart just trino
docker-compose restart trino-coordinator