jefflester/minitrino

Add --worker-count option to Minipresto provision command

Closed this issue · 1 comments

Minipresto should be able to provision a multi-node cluster if specified. Example:

minipresto provision -c snowflake-jdbc elasticsearch --worker-count 2

The resulting workers should be built using the Docker SDK, as forcing this through Compose would get messy real quick (Compose ought to be used where it is absolutely needed). The high level flow of this should look like:

  • Provision everything before the workers
  • Spin up n worker containers from the standard Starburst Presto image (with Presto server only and no CLI)
    • Copy everything from coordinator's etc/presto/ directory and make necessary modifications to those files (mainly node.properties and config.properties)
      • This should ideally be done by copying coordinator volumes to the worker containers, but this may not be possible given the lack of labels on non-persistent volumes
    • Apply label sets to worker containers
    • Run a health check to verify worker connectivity to the coordinator

This will be a semi-substantial lift, but will improve the flexibility of the tool in a way that doesn't bind us more to Compose shell commands.

Update

After some messing around, it's looking like volume copying might be impossible with the way things are currently set up. The current idea is to:

  • Copy all volumes from coordinator to worker
  • Modify config.properties and node.properties in the worker node(s)

This would be nice and simple, but for simple volumes, there is no accessible link between non-persistent volumes mounted to a container and the container itself. For example, if we mount an elasticsearch.properties file to the coordinator, we cannot detect that relationship through the SDK and copy that to the new worker.

I am thinking of how best to tackle this. The very ugly and inefficient brute force technique would be to copy the entire etc and plugin directories from the coordinator to the host filesystem, then copy those files to the worker. This would be less than ideal...

Closing this for now. Minipresto is not intended to create a distributed cluster, as that opens the door for users productionizing the tool. Best to stay with a local/testing model for now.