/dask-ec2

Start a cluster in EC2 for dask.distributed

Primary LanguagePython

ARCHIVED

As of November 3rd, 2020 this respository is now archived. Please consult the Dask Cloud docs page for more information on deploying Dask with cloud resources.

Dask EC2 Build Status Coverage Status

Easily launch a cluster on Amazon EC2 configured with dask.distributed, Jupyter Notebooks, and Anaconda.

Installation

You also install dask-ec2 using pip:

$ pip install dask-ec2

You can also install dask-ec2 and its dependencies from the conda-forge repository using conda:

$ conda install dask-ec2 -c conda-forge

Usage

Note: dask-ec2 uses boto3 to interact with Amazon EC2. You can configure your AWS credentials using Environment Variables or Configuration Files.

The dask-ec2 up command can be used to create and provision a cluster on Amazon EC2:

$ dask-ec2 up --help
Usage: dask-ec2 up [OPTIONS]

Options:
  --keyname TEXT                Keyname on EC2 console  [required]
  --keypair PATH                Path to the keypair that matches the keyname
                                [required]
  --name TEXT                   Tag name on EC2
  --tags TEXT                   Additional EC2 tags.  Comma separated K:V
                                pairs: K1:V1,K2:V2
  --region-name TEXT            AWS region  [default: us-east-1]
  --vpc-id TEXT                 EC2 VPC ID
  --subnet-id TEXT              EC2 Subnet ID on the VPC
  --iaminstance-name TEXT       IAM Instance Name
  --ami TEXT                    EC2 AMI  [default: ami-d05e75b8]
  --username TEXT               User to SSH to the AMI  [default: ubuntu]
  --type TEXT                   EC2 Instance Type  [default: m3.2xlarge]
  --count INTEGER               Number of nodes  [default: 4]
  --security-group TEXT         Security Group Name  [default: dask-ec2-default]
  --security-group-id TEXT      Security Group ID (overwrites Security Group
                                Name)
  --volume-type TEXT            Root volume type  [default: gp2]
  --volume-size INTEGER         Root volume size (GB)  [default: 500]
  --file PATH                   File to save the metadata  [default:
                                cluster.yaml]
  --provision / --no-provision  Provision salt on the nodes  [default: True]
  --anaconda / --no-anaconda    Bootstrap anaconda  [default: True]
  --dask / --no-dask            Install Dask.Distributed in the cluster
                                [default: True]
  --notebook / --no-notebook    Start a Jupyter Notebook in the head node
                                [default: True]
  --nprocs INTEGER              Number of processes per worker  [default: 1]
  --source / --no-source        Install Dask/Distributed from git master
                                [default: False]
  -h, --help                    Show this message and exit.

The minimal required arguments for the dask-ec2 up command are:

$ dask-ec2 up --keyname my_aws_key --keypair ~/.ssh/my_aws_key.pem

This will create a cluster.yaml in the directory that it was executed, and this file is required to use the other commands in the CLI.

Once a cluster is running, the dask-ec2 command can be used to create or destroy a cluster, ssh into nodes, or other functions:

$ dask-ec2
Usage: dask-ec2 [OPTIONS] COMMAND [ARGS]...

Options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

Commands:
  anaconda          Provision anaconda
  dask-distributed  dask.distributed option
  destroy           Destroy cluster
  notebook          Provision the Jupyter notebook
  provision         Provision salt instances
  ssh               SSH to one of the node. 0-index
  up                Launch instances