CDP setup in one click

This repository contains a set of scripts that will create CDP minimal assets for demo in one wrapper script, including:

  • Cloud pre-requisites (bucket, policies, roles, network)
  • Cloud CDP Environmment
  • CDP Data Lake
  • Any CDP Data Hub cluster definition

Pre-Requisites

AWS

  • AWS CLI (Instructions)
    • You must run aws configure after install, and ensure your region is set
  • AWS ssh key (Instructions); Alternatively, you can use the field or _field keys setup in our AWS SE accounts

Azure

  • Azure CLI (Instructions)
    • Use az login after install to login
  • ssh Key: you will need to paste the public key into your parameters file

Note: Azure CML is not supported yet, so don't add it in your parameters file :)

CDP

  • CDP CLI (Instructions)

  • CDP Credential (Instructions)

    • You must set your Workload Password in your CDP Profile (Shortcut)
    • You must generate a CLI Access Key in your CDP Profile, and configure it to your local CDP CLI (Shortcut)
    • Your user should have minimally EnvironmentCreator and IamUser as roles

Parameters file format

Detailed format

{
    Required parameters: 
    "required": {
                
        Prefix used for cdp assets creation:  
        "prefix":         "pvi",
        
        Name of credential to use: 
        "credential":     "pvidal-aws-se-credential",

        Region to use (should also be the default region of your cloud provider cli profile): 
        "region":         "us-east-1",
        
        ssh key to use for cdp instances setup: 
        "key":            "field",

        Workload password to use in CDP: 
        "workload_pwd":   "cdpw0rksh0p",

        Array of datahub to setup (can be empty): 
        "datahub_list": [

            Element 1: 
            {
                Definition from cdp-cluster-definitions folder: 
                "definition": "data-mart.json",

                Custom script from cdp-dh-custom-scripts folder: 
                "custom_script": ""
            },

            Element 2: 
            {
                Definition from cdp-cluster-definitions folder: 
                "definition": "cdp-mod-workshop.json",

                Custom script from cdp-dh-custom-scripts folder: 
                "custom_script": "cdp_mod_wkp.sh"
            },
        ],

        Array of ml workspaces to setup (can be empty): 
        "ml_workspace_list": [

            Element 1: 
            {
                Definition from cml-workspace-definitions folder: 
                "definition": "small_workspace.json",

                Flag to enable monitoring, governance and model metrics (possible values yes or no): 
                "enable_workspace": "no"
            }

        ],

        Array of op database to setup (can be empty): 
        "op_db_list": [

            Element 1: 
            {
                Name of the database you want to create: 
                "database_name": "your_db_name"
            }

        ],
        Array of CDW vw to setup (can be empty): 
        "dw_list": [

            Element 1: 
            {
                Name of the vw you want to create: 
                "name": "vw-name",
                Type of vw you want to create: 
                "type": "hive"
            }
        ]
    },

    Optional (defaulted) parameters (can be empty): 
    "optional": {

        Cloud provider (default: aws, possible values: aws, az): 
        "cloud_provider": "aws", 

        Cloud provider cli profile (AWS-your profile name / AZ-your subscription name or ID) (default: default): 
        "cloud_profile":    "default",

        CDP cli profile (default: default): 
        "cdp_profile":    "default",

        Flag to create cdp credential or not (default: no, possible values: yes, no) 
        "generate_credential": "no",

        NOT SUPPORTED YET Flag to generate minimal cross account role policy or not (default: no, possible values: yes, no) 
        "generate_minimal_cross_account": "no",

        Flag to create network in cloud provider or not (default: no, possible values: yes, no) 
        "create_network": "no",

        CIDR to open in your security group of your network (port 443, 22 and 9443 will be open to this) 
        "sg_cidr": "0.0.0.0/0",

         Use private IPs for env deployment (default: no, possible values: yes, no). NB: For AWS If this is set to "yes" and "create_network" is set to "no", you must currently use the DEV CDP CLI.
        "use_priv_ips": "no",

        Use existing network for env deployment (path to the network file, see examples in parameters_sample) 
        "existing_network_file": "[path_to_network_file]",

        The Data Lake scale you'd like to have (default: LIGHT_DUTY, possible vaules: LIGHT_DUTY, MEDIUM_DUTY_HA) 
        "scale": "[LIGHT_DUTY]",

        If creating an environment with private IPs, create a bastion in one of the public subnets that you can proxy to to access all the UIs. (default: no, possible vaules: no, yes).
        "create_bastion": "yes",

        Enable workload analytics (i.e. WXM): (default: --no-enable-workload-analytics, possible values: --enable-workload-analytics, --no-enable-workload-analytics) 
        "workload_analytic": "--enable-workload-analytics",

        Array of custom tags to setup (if empty the scripts will generate project, owner, end_date and deploytool tags): 
        "tags": [
            {
                "key": "my_tag",
                "value": "my_value"
            },
            {
                "key": "my_other_tag",
                "value": "my_other_value"
            }
        ],

    }

}

Parameters file samples

See parameters_sample folder

Doing all the things (full wrapper)

Creation

Run the source target wrapper script:

cdp_create_all_the_things.sh <your_param_file> 

Deletion

Run the deletion script:

cdp_delete_all_the_things.sh <your_param_file>

Doing some of the things (individual wrappers)

AWS things

Pre-requisites

cdp_aws_pre_reqs.sh <your_param_file>

SDX

cdp_aws_sdx.sh <your_param_file> [<network_file>]

Azure things

Pre-requisites

cdp_az_pre_reqs.sh <your_param_file>

SDX

cdp_az_sdx.sh <your_param_file>

CDP things

Datahub

cdp_create_datahub_things.sh <your_param_file>

CML

cdp_create_ml_things.sh <your_param_file> 

COD

cdp_create_opdb_things.sh <your_param_file> 

CDW

cdp_create_dw_things.sh <your_param_file> 

Starting / Stopping (work in progress)

cdp_stop_all_the_things.sh <your_param_file> 
cdp_start_all_the_things.sh <your_param_file> 

Development Optional flags

Note: some flags require dev cli, not for public consumption, use at your own risk

--no-cost-check: removes cost check --no-db-ha: does not create DB HA backend --no-sync-users: does launch sync users to free-ipa

Future Improvements

  • Add support for Azure ML
  • Add support for minimal set of policies for AWS
  • Add dynamic definition updates
  • Create a nifi flow wrapper?

Author & Contributors

Paul Vidal - LinkedIn

Dan Chaffelson - LinkedIn

Chris Perro - LinkedIn

André Araújo - LinkedIn

Nathan Anthony - LinkedIn

Steffen Maerkl - LinkedIn

Mike Riggs - LinkedIn

Ryan Cicak - LinkedIn

Alex Moundalexis - LinkedIn