This repository contains a demo integration of trino with Apache Superset.
Many times, I like to visualize contents in formats like parquet
, csv
, json
etc., But getting them in Apache Superset was little hard for me.
With this repository, we can visualize columnar file formats.
The complete stack is runable in local. Useful for doing local data analytics experimentation.
- Trino (formerly PrestoSQL)
- Minio - for hosting the file. AWS S3 compatible.
- Hive Metastore - for accessing files from Trino using Hive connector
- Apache superset - for visualizing
The following file types are supported for the Hive connector:
- ORC
- Parquet
- Avro
- RCText (RCFile using ColumnarSerDe)
- RCBinary (RCFile using LazyBinaryColumnarSerDe)
- SequenceFile
- JSON (using org.apache.hive.hcatalog.data.JsonSerDe)
- CSV (using org.apache.hadoop.hive.serde2.OpenCSVSerde)
- TextFile
docker-compose up
- Docker volumes are locally mounted. This should help in understanding the data of different service.
- First time :
sh superset_init.sh
- In Superset, add trino with SqlAlchemy URI -
trino://hive@trino-coordinator:8080/hive
- Superset (username:
admin
, password:admin
) - Minio - username:
minio_access_key
, password:minio_secret_key
) - Trino
docker exec -it trino trino
Run SQL commands listed trino/init.sql
# restart just trino
docker-compose restart trino-coordinator