/trino-minio-docker

Minimal example to run Trino, Minio, and Hive standalone metastore on docker

MIT LicenseMIT

trino-minio-docker

Minimal example to run Trino with Minio and the Hive standalone metastore on Docker. The data in this tutorial was converted into an Apache Parquet file from the famous Iris data set.

Installation and Setup

Install s3cmd with:

sudo apt update
sudo apt install -y \
    s3cmd \
    openjdk-11-jre-headless  # Needed for trino-cli

Pull and run all services with:

docker-compose up

Configure s3cmd with (or use the minio.s3cfg configuration):

s3cmd --config minio.s3cfg --configure

Use the following configuration for the s3cmd configuration when prompted:

Access Key: minio_access_key
Secret Key: minio_secret_key
Default Region [US]:
S3 Endpoint [s3.amazonaws.com]: localhost:9000
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: localhost:9000
Encryption password:
Path to GPG program [/usr/bin/gpg]:
Use HTTPS protocol [Yes]: no

To create a bucket and upload data to minio, type:

s3cmd --config minio.s3cfg mb s3://iris
s3cmd --config minio.s3cfg put data/iris.parq s3://iris

To list all object in all buckets, type:

s3cmd --config minio.s3cfg la

Access Trino with CLI and Prepare Table

Download trino cli with:

wget https://repo1.maven.org/maven2/io/trino/trino-cli/352/trino-cli-351-executable.jar \
  -O trino
chmod +x trino  # Make it executable

Create schema and create table with:

./trino --execute "
CREATE SCHEMA IF NOT EXISTS minio.iris
WITH (location = 's3a://iris/');

CREATE TABLE IF NOT EXISTS minio.iris.iris_parquet (
  sepal_length DOUBLE,
  sepal_width  DOUBLE,
  petal_length DOUBLE,
  petal_width  DOUBLE,
  class        VARCHAR
)
WITH (
  external_location = 's3a://iris/',
  format = 'PARQUET'
);"

Query the newly created table with:

./trino --execute "
SHOW TABLES IN minio.iris;
SELECT * FROM minio.iris.iris_parquet LIMIT 5;"

License

This project is licensed under the MIT license. See the LICENSE for details.