This projects aims to make it easy to get started with Presto. It is based on Docker and Docker compose. Currently, the following features are supported:
- Single machine set-up
- Function Namespace Manager (for creating functions)
- Hive connector, Hive Metastore, and pseudo-replicated HDFS (i.e., a single data node without replication)
The following should be enough to bring up all required services:
docker-compose up
This brings up a MySQL server, which takes a bit to start for the first time, and which Presto depends on. If starting the services fails the first time, try interrupting them (with CTRL+C
) and bringing them up again.
If you are behind a corporate firewall, you will have to configure Maven (which is used to build part of Presto) as follows before running above command:
export MAVEN_OPTS="-Dhttp.proxyHost=your.proxy.com -Dhttp.proxyPort=3128 -Dhttps.proxyHost=your.proxy.com -Dhttps.proxyPort=3128"
The data/
folder is mounted into the HDFS namenode container, from where you can upload it using the HDFS client in that container (docker-presto_presto_1
may have a different name on your machine; run docker ps
to find out):
docker exec -it docker-presto_namenode_1 hadoop fs -mkdir /dataset
docker exec -it docker-presto_namenode_1 hadoop fs -put /data/file.parquet /dataset/
docker exec -it docker-presto_namenode_1 hadoop fs -ls /dataset
You can use the Presto CLI included in the Docker containers of this project (adapt container name if necessary):
docker exec -it docker-presto_presto_1 presto-cli --catalog hive --schema default
Alternatively, you can download the Presto CLI, rename it, make it executable, and run the following:
./presto-cli --server localhost:8080 --catalog hive --schema default
Suppose you have the following file test.json
:
{"s": "hello world", "i": 42}
Upload it to /test/test.csv
on HDFS as described above. Then run the following in the Presto CLI:
CREATE TABLE test (s VARCHAR, i INTEGER) WITH (EXTERNAL_LOCATION = 'hdfs://namenode/test/', FORMAT = 'JSON');
In case you need to make manual changes or want to inspect the MySQL databases, you can connect to it like this:
docker exec -it docker-presto_mysql_1 mysql -ppassword