/cloudera-deploy

A general purpose framework for automating Cloudera Products

Primary LanguageShellApache License 2.0Apache-2.0

Cloudera Deploy

Automation wrappers for the Cloudera Ansible Collection

Readme last updated: 2021-05-10

Cloudera Deploy is a toolset for deploying the Cloudera Data Platform (CDP). It’s scope includes both Public and Private Cloud products and Base clusters, and application setup, execution and other post-deployment functions.

You can use Cloudera Deploy as your entrypoint for getting started with CDP. The toolset uses straightforward configuration definitions to instruct the automation functions, yet is extensible and highly configurable. The toolset can be a great foundation for custom entrypoints, CI/CD pipelines, and development environments.

Quickstart

  1. Clone this repository into your preferred local project directory, e.g:
    git clone https://github.com/cloudera-labs/cloudera-deploy.git

  2. Ensure Docker is running, then run the quickstart script to prepare your local runner environment, e.g:
    cd cloudera-deploy && chmod +x quickstart.sh && ./quickstart.sh

  3. You should now see the orange cldr prompt within the container environment. You may wish to set a default password and other entries in your local Cloudera Deploy user profile before proceeding:
    vi ~/.config/cloudera-deploy/profiles/default

  4. Ensure you have CDP and the appropriate cloud provider credentials present in your user profile:

    1. You may need to issue new AWS API Keys (or Azure, GCP) and set them up with aws configure, etc.

    2. In addition, you may need to do the same with CDP API Keys and set them up with cdp configure.

  5. Run the main playbook with the example definition:
    ansible-playbook /opt/cloudera-deploy/main.yml \
    -e "definition_path=examples/sandbox" \
    -t run,default_cluster

Configuration

Cloudera Deploy is powered by Ansible and provides a standard configuration and execution model for CDP deployments and their applications. It can be run within a container or directly on a host.

Specifically, Cloudera Deploy is an Ansible project that uses a set of playbooks, roles, and tags to construct a runlevel-like management experience for cloud and cluster deployments. It leverages several collections, both Cloudera and third-party.

Software Dependencies

Cloudera Deploy requires a number of host applications, services, and Python libraries for its execution. These dependencies are already packaged for ease-of-use in Cloudera Labs Ansible-Runner, another project within Cloudera Labs.

Alternatively, and especially if you plan on running Cloudera Deploy in your own environment, you may install the dependencies yourself.

Collections and Roles

Cloudera Deploy relies on a number of Ansible collections:

  • cloudera.exe

  • cloudera.cluster

  • cloudera.cloud

And roles:

  • geerlingguy.postgresql

  • ansible-role-mysql

These collection dependencies can be found in the ansible.yml file in the cldr-runner project.

Cloudera Deploy does have a single dependency for its own execution, the community.crypto collection. To install all of these dependencies, you can run the following:

# Get the cldr-runner dependency file first
curl https://github.com/cloudera-labs/cldr-runner/tree/main/payload/deps/ansible.yml --output requirements.yml

# Install the collections (and their dependencies)
ansible-galaxy collection install -r requirements.yml

# Install the roles
ansible-galaxy role install -r requirements.yml

# Install the crypto collection
ansible-galaxy collection install community.crypto

Python and Clients

The supporting Python libraries and other clients can be installed using the various dependencies files in the cldr-runner project directly. You might find it easier to follow the installation instructions for cloudera.exe and cloudera.cluster, the two collections that drive this set of dependencies.

For the community.crypto collection dependency, you will need to ensure that the ssh-keygen executable is on your Ansible controller.

The dependencies cover the full range of the automation tooling, from infrastructure on public or private cloud to the relevant Cloudera platform assets. If you are only working with a limited part of the tooling, then you may not need the full list of dependencies. e.g., if you are only working with AWS infrastructure, it is safe to only install those dependencies or use the tagged cldr-runner version.

User Input Dependencies

Cloudera Deploy does require a small set of user-supplied information for a successful deployment. A minimum set of user inputs is defined in a profile file (see the profile.yml template for details). For example, the profile.yml should define your password for the Administrator account of the deployed services.

The default location for profiles is ~/.config/cloudera-deploy/profiles/. Cloudera Deploy looks for the default file in this directory unless the Ansible runtime variable profile is set, e.g. -e profile=my_custom_profile. Creating additional profiles is simple, and you can use the profile.yml template as your starting point.

CDP Public Cloud

For CDP Public Cloud, you will need an Access Key and Secret set in your user profile. The tooling uses your default profile unless you instruct it otherwise. (See Configuring CDP client with the API access key.)

Cloud Providers

For Azure and AWS infrastructure, the process is similar, and these parameters may likewise be overridden.

For Google Cloud, we suggest you issue a credentials file, store it securely in your profile, and then provide the path to that file in profile.yml, as this works best with both CLI and Ansible Gcloud interactions.

We suggest you set your default infra_type in profile.yml to match your preferred default Public Cloud Infrastructure credentials.

CDP Private Cloud

For CDP Private Cloud you will need a valid Cloudera license file in order to download the software from the Cloudera repositories. We suggest this is stored in your user profile in ~/.cdp/ and set in the profile.yml config file.

If you are also using Public Cloud infrastructure to host your CDP Private Cloud clusters, then you will need those credentials as well.

SSH Host Key Checking

For CDP Private Cloud clusters and other direct inventory scenarios, you will need to manage SSH host key validation appropriate to your specific environment.

Be advised! By default, the quickstart.sh script explicitly sets the ANSIBLE_HOST_KEY_CHECKING variable to False for ease-of-use with an introductory deployment. However, this setting is not recommended for any other deployment type. For all other deployment types, you should directly manage your SSH host key checking.

A common approach is to create your own "startup" script using the quickstart.sh as a template, and setting the appropriate Ansible SSH configuration variables.

In some scenarios, for example, a reused pool of dynamic hosts within a development Openstack environment, you might wish to manage this control from your host machine’s SSH config file. For example:

# ~/.ssh/config

# Disable host key checking only for your specific environment
Host *.your.development.domain
   StrictHostKeyChecking no

These settings will flow from your host to the Docker container’s environment.

Execution

Cloudera Deploy utilizes a single entrypoint playbook — main.yml — that examines the user-provided profile details, a deployment definition, and any optional Ansible tags and then runs the appropriate actions. At minimum, you execute a deployment like so:

ansible-playbook <location of cloudera-deploy>/main.yml \
  -e "definition_path=<absolute or relative directory to main.yml>"
Note
The location defined by definition_path is relative to the location of the main.yml playbook and can also be an absolute location.

Tags

Cloudera Deploy exposes a set of tags that allows fine-grained inclusion and exclusion of functions, in particular, a runlevel-like management process.

Table 1. Partial List of Available Execution Tags

infra

Infrastructure (cloud provider assets)

plat

Platform (CDP Public Cloud Datalakes). Assumes infra.

run

Runtime (CDP Public Cloud experiences, e.g. Cloudera Machine Learning (CML)). Assumes infra and plat.

full_cluster

CDP Private Cloud Base Clusters.

Current Tags: verify_inventory, verify, full_cluster, default_cluster, verify_definition, custom_repo, verify_parcels, database, security, kerberos, tls, ha, os, users, jdk, mysql_connector, oracle_connector, fetch_ca, cm, license, autotls, prereqs, restart_agents, heartbeat, mgmt, preload_parcels, kts, kms, restart_stale, teardown_ca, teardown_all, teardown_tls, teardown_cluster, infra, init, plat, run, validate

With these tags, you can set your deployment to a given "runlevel" state:

# Ensure only the infrastructure layer is available
ansible-playbook main.yml -e "definition_path=my_example" -t infra

or select or skip a level or function:

# Ensure the platform and runtimes are available, but skip any infrastructure
ansible-playbook main.yml -e "definition_path=my_example" -t run --skip-tags infra

For details on the various runlevel-like tags for CDP Public Cloud, see the Runlevel Guide in the cloudera.exe project.

Definitions

Cloudera Deploy uses a set of configuration files within a directory to define and coordinate a deployment. This directory also stores any artifacts created during the deployment, such as Ansible inventory files, CDP environment readouts, etc.

The main.yml entrypoint playbook expects the runtime variable definition_path which should point at the absolute or relative (to the playbook) directory hosting these configuration files.

Within the directory, you must supply the following files:

  • definition.yml

  • application.yml

Optionally, if deploying a CDP Private Cloud cluster or need to set up adhoc IaaS infrastructure, you can supply the following :

  • inventory_static.ini

  • inventory_template.ini

The definition directory can host any other file or asset, such as data files, additional configuration details, additional playbooks. However, Cloudera Deploy will not operate unless the definition.yml and application.yml files are present.

definition.yml

The required definition.yml file contains top-level configuration keys that define and direct the deployment.

Table 2. Top-Level Configuration Keys

infra

Hosting infrastructure to manage

env

CDP Public Cloud Environment deployment (on the infrastructure)

clusters

CDP Private Cloud Cluster deployment (on the Infrastructure)

mgmt

hosts

Within the top-level keys, you may override the defaults appropriate to that section.

You may also add other top-level configuration keys if your automation requires it, e.g. if your application.yml playbook needs its own configuration details.

More detailed documentation of all the options is beyond the scope of this introductory readme; further documentation is forthcoming.

application.yml

The required application.yml file is not a configuration file, it is actually an Ansible playbook. At minimum, this playbook requires a single Ansible play; a basic no-op task works well if you wish to take no additional actions beyond the core deployment.

For more sophisticated post-deployment actitivies, you can expand this playbook as much as needed. For example, the playbook can interact with hosts and inventory, execute computing jobs on deployment environments, and include additional playbooks and configuration files.

Note
This file is a standard Ansible playbook, and when it is executed (via import_playbook) by the main.yml entrypoint, the working directory of the Ansible executable is changed to the directory of the application.yml playbook.

inventory_static.ini

You may also include an inventory_static.ini file that describes your static Ansible inventory. This file will be automatically loaded and added to the Ansible inventory. Note that you can also use the standard Ansible -i switch to include other static inventory.

inventory_template.ini

If included, Cloudera Deploy will use a definition’s inventory_template.ini file, which describes a set of dynamic host inventory, and provision these hosts as infrastructure for the deployment, typically for a CDP Private Cloud cluster.

Note
This currently only works on AWS.

Getting Involved

Contribution instructions are coming soon!

Copyright 2021, Cloudera, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.