/puppet-cloudera

Puppet module to deploy Cloudera Manager and Cloudera's Distribution, including Apache Hadoop (CDH).

Primary LanguageRubyOtherNOASSERTION

Cloudera Manager

Build Status

Table of Contents

  1. Overview
  2. Module Description - What the module does and why it is useful
  3. Setup - The basics of getting started with this module
  4. Usage - Configuration options and additional functionality
  5. Reference - An under-the-hood peek at what the module is doing and how
  6. Limitations - OS compatibility, etc.
  7. Development - Guide for contributing to the module

Overview

This Puppet module manages the installation and configuration of Cloudera Manager, a management application for Apache Hadoop, on the Cloudera official supported operating systems.

Module Description

This module manages the installation of Cloudera Manager, a management application for Apache Hadoop. It follows the standards written in the Cloudera Manager Installation Guide "Installation Path B - Installation Using Your Own Method". By default, this module assumes that parcels will be used to deploy Cloudera's Distribution of Apache Hadoop (CDH) and related software. If parcels are not desired, this module can also manage the installation of CDH including HDFS & MapReduce, Impala, Sentry, Search, Spark, HBase, and LZO compression. The module can also configure TLS security of the Cloudera Manager communications channels, and set up Cloudera Manager to use an alternative to the embedded database.

Cloudera Certified This module is certified on Cloudera 5.

Setup

What this module affects

  • Installs the Cloudera software repository for CM.
  • Installs Oracle Java Development Kit (JDK) 7.
  • Optionally installs the Oracle Java Cryptography Extensions.
  • Installs the CM agent.
  • Configures the CM agent to talk to a CM server.
  • Starts the CM agent.
  • Sets the kernel vm.swappiness to 0.
  • Disables the kernel transparent hugepage compaction.
  • Separately installs the CM server and database connectivity (by default to the embedded database server).
  • Separately starts the CM server.
  • Optionally installs the Cloudera software repository for CDH.
  • Optionally installs most components of CDH 5 including HBase, Impala, Search, and Spark.
  • Optionally installs GPL Extras (LZO).

Requirements

Please read through the Cloudera Manager Requirements document in order to discover all of the entities (ie operating systems, databases, and browsers) supported by Cloudera Manager. Pay close attention to the Resource Requirements and Networking and Security Requirements sections. There are a number of requirements that this module cannot easily configure for your environment (ie No blocking by Security-Enhanced Linux (SELinux)) and which you must ensure are correct on your platform.

Beginning with this module

Most nodes that will be a part of a Hadoop cluster will use this declaration.

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
}

The node that will be the CM server (ie smhost.localdomain) will use this declaration. This should only be included on one node of your environment. By default it will install the embedded PostgreSQL database on the same node. With the correct parameters, it can instead connect to local or remote MySQL, PostgreSQL, or Oracle RDBMS databases.

class { '::cloudera':
  cm_server_host   => 'smhost.localdomain',
  install_cmserver => true,
}

Upgrading

Deprecation Warning

  • The default for use_parcels will switch to true before the 1.0.0 release.

This:

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
}

would become this:

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
  use_parcels    => false,
}
  • The puppetlabs/mysql dependency will update to version 2 before the 1.0.0 release. Make sure to review its changelog in the case of an upgrade.

  • The class ::cloudera::repo will be renamed to ::cloudera::cdh::repo and the Impala repository will be split out into ::cloudera::impala::repo before the 1.0.0 release.

This:

class { '::cloudera::repo':
  cdh_version => '4.1',
  cm_version  => '4.1',
}

would become this:

class { '::cloudera::cdh::repo':
  version => '4.1',
}
class { '::cloudera::impala::repo':
  version => '4.1',
}
  • The class parameters and variables yumserver and yumpath have been renamed to reposerver and repopath respectively for the 2.0.0 release. This makes the name more generic as it applies to APT and Zypprepo as well as YUM package repositories.

This:

class { 'cloudera':
  cm_yumserver => 'http://packageserver.localdomain',
  cm_yumpath   => '/gplextras/',
}

would become this:

class { 'cloudera':
  cm_reposerver => 'http://packageserver.localdomain',
  cm_repopath   => '/gplextras/',
}
  • The use_gplextras parameter has been renamed to install_lzo for the 2.0.0 release.

This:

class { 'cloudera':
  cm_server_host => 'smhost.example.com',
  use_gplextras  => true,
}

would become this:

class { 'cloudera':
  cm_server_host => 'smhost.example.com',
  install_lzo    => true,
}

Usage

All interaction with the cloudera module can be done through the main cloudera class. This means you can simply toggle the options in ::cloudera to have full functionality of the module.

TLS Security

Level 1: Configuring TLS Encryption only for Cloudera Manager

Level 2: Configuring TLS Authentication of Server to Agents

Level 3: Configuring TLS Authentication of Agents to Server

This module's deployment of TLS provides both level 1 and level 2 configuration (encryption and authentication of the server to the agents). Level 3 is not presently implemented. You will need to provide a TLS certificate and the signing certificate authority for the CM server. See the File resources in the below example for where the files need to be deployed.

There are some settings inside CM that can only be configured manually. See the Level 1 instructions for the details of what to change in the WebUI and use the below values:

Setting                       Value
Use TLS Encryption for Agents (check)
Path to TLS Keystore File     /etc/cloudera-scm-server/keystore
Keystore Password             The value of server_keypw in Class['::cloudera::cm5::server'].
Use TLS Encryption for        (check)
  Admin Console

The node that will be the CM agent may use this declaration:

class { '::cloudera':
  server_host => 'smhost.localdomain',
  use_tls     => true,
  install_jce => true,
}
file { '/etc/pki/tls/certs/cloudera_manager.crt': }

The node that will be the CM agent+server may use this declaration:

class { '::cloudera':
  server_host      => 'smhost.localdomain',
  install_cmserver => true,
  use_tls          => true,
  install_jce      => true,
  server_keypw     => 'myPassWord',
}
file { '/etc/pki/tls/certs/cloudera_manager.crt': }
file { '/etc/pki/tls/certs/cloudera_manager-ca.crt': }
file { "/etc/pki/tls/certs/${::fqdn}-cloudera_manager.crt": }
file { "/etc/pki/tls/private/${::fqdn}-cloudera_manager.key": }

External Database

If you decide not to use the embedded database, the Cloudera Manager server database configuration can be completed by configuring this module to call the scm_prepare_database.sh script. The external database must be configured and ready for connection with the supplied credentials via some method outside of this module.

class { '::cloudera':
  cm_server_host   => 'smhost.localdomain',
  install_cmserver => true,
  db_type          => 'postgresql',
  db_host          => 'dbhost.localdomain',
  db_port          => '5432',
  db_user          => 'root',
  db_pass          => 'SeCrEt',
}

Parcels

Parcel is an alternative binary distribution format supported by Cloudera Manager 4.5+ that simplifies distribution of CDH and other Cloudera products. By default, this module assumes software deployment of CDH via parcel. To allow Cloudera Manager to install CDH via RPMs (or DEBs) instead of parcels, just set use_parcels => false.

Nodes that will be cluster members will use this declaration:

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
  use_parcels    => false,
}

For more advanced use cases, nodes that will be gateways may use this declaration to install extra parts of CDH:

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
  use_parcels    => false,
}
class { '::cloudera::cdh5::mahout': }
class { '::cloudera::cdh5::kite': }
# Install Oozie WebUI support (optional):
class { '::cloudera::cdh5::oozie::ext': }
# Install MySQL support (optional):
class { '::cloudera::cdh5::hue::mysql': }
class { '::cloudera::cdh5::oozie::mysql': }

For more advanced use cases, the node that will be just the CM server may use this declaration: (This will skip installation of the CDH software as it is not required.)

class { '::cloudera::cm5::repo': } ->
class { '::cloudera::java5': } ->
class { '::cloudera::java5::jce': } ->
class { '::cloudera::cm5': } ->
class { '::cloudera::cm5::server': }

LZO Compression

Hadoop-specific LZO compression libraries are available in the Cloudera GPL Extras repository. To deploy the Hadoop-specific and also the native libraries on a non-parcel system just add install_lzo => true to the class declaration. Additional configuration in Cloudera Manager will be required to activate the functionality (ignore the mention of parcels in the link to the documentation).

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
  use_parcels    => false,
  install_lzo    => true,
}

To deploy the native LZO compression libraries on a parcel system just add install_lzo => true to the class declaration. Additional configuration in Cloudera Manager will be required to activate the functionality.

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
  use_parcels    => true,
  install_lzo    => true,
}

Reference

Classes

Public Classes

  • cloudera: Installs and configures Cloudera Manager. Includes most other classes.

Private Classes

  • cloudera::java5: Installs the Oracle Java Development Kit (JDK) from the Cloudera Manager repository.
  • cloudera::java5::jce: Installs the Oracle Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files.
  • cloudera::cm5
  • cloudera::cm5::repo
  • cloudera::cm5::server
  • cloudera::cdh5
  • cloudera::cdh5::repo
  • cloudera::gplextras5
  • cloudera::gplextras5::repo
  • cloudera::java: Installs the Oracle Java Development Kit (JDK) from the Cloudera Manager repository.
  • cloudera::java::jce: Installs the Oracle Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files.
  • cloudera::cm
  • cloudera::cm::repo
  • cloudera::cm::server
  • cloudera::cdh
  • cloudera::cdh::repo
  • cloudera::gplextras
  • cloudera::gplextras::repo
  • cloudera::impala
  • cloudera::impala::repo
  • cloudera::search
  • cloudera::search::repo
  • cloudera::lzo

Class: cloudera

Parameters

ensure

Ensure if present or absent. Default: present

autoupgrade

Upgrade package automatically, if there is a newer version. Default: false

service_ensure

Ensure if service is running or stopped. Default: running

service_enable

Start service at boot. Default: true

cdh_reposerver

URI of the YUM server. Default: http://archive.cloudera.com

cdh_repopath

The path to add to the $cdh_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

cdh_version

The version of Cloudera's Distribution, including Apache Hadoop to install. Default: 5

cm_reposerver

URI of the YUM server. Default: http://archive.cloudera.com

cm_repopath

The path to add to the $cm_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

cm_version

The version of Cloudera Manager to install. Default: 5

cm5_repopath

The path to add to the $cm_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

ci_reposerver

URI of the YUM server. Default: http://archive.cloudera.com

ci_repopath

The path to add to the $ci_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

ci_version

The version of Cloudera Impala to install. Default: 1

cs_reposerver

URI of the YUM server. Default: http://archive.cloudera.com

cs_repopath

The path to add to the $cs_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

cs_version

The version of Cloudera Search to install. Default: 1

cg_reposerver

URI of the YUM server. Default: http://archive.cloudera.com

cg_repopath

The path to add to the $cg_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

cg_version

The version of Cloudera Search to install. Default: 5

cm_server_host

Hostname of the Cloudera Manager server. Default: localhost

cm_server_port

Port to which the Cloudera Manager server is listening. Default: 7182

use_tls

Whether to enable TLS on the Cloudera Manager server and agent. Default: false

verify_cert_file

The file holding the public key of the Cloudera Manager server as well as the chain of signing certificate authorities. PEM format. Default: /etc/pki/tls/certs/cloudera_manager.crt or /etc/ssl/certs/cloudera_manager.crt

use_parcels

Whether to install CDH software via parcels or packages. Default: true

install_lzo

Whether to install the native LZO compression library packages. If use_parcels is false, then also install the Hadoop-specific LZO compression library packages. You must configure and deploy the GPLextras parcel repository if use_parcels is true. Default: false

install_java

Whether to install the Cloudera supplied Oracle Java Development Kit. If this is set to false, then an Oracle JDK will have to be installed prior to applying this module. Default: true

install_jce

Whether to install the Oracle Java Cryptography Extension unlimited strength jurisdiction policy files. This requires manual download of the zip file. See files/README_JCE.md for download instructions. Default: false

install_cmserver

Whether to install the Cloudera Manager Server. This should only be set to true on one host in your environment. Default: false

database_name

Name of the database to use for Cloudera Manager. Default: scm

username

Name of the user to use to connect to database_name. Default: scm

password

Password to use to connect to database_name. Default: scm

db_host

Host to connect to for database_name. Default: localhost

db_port

Port on db_host to connect to for database_name. Default: 3306

db_user

Administrative database user on db_host. Default: root

db_pass

Administrative database user db_user password. Default:

db_type

Which type of database to use for Cloudera Manager. Valid options are embedded, mysql, oracle, or postgresql. Default: embedded

server_ca_file

The file holding the PEM public key of the Cloudera Manager server certificate authority. Default: /etc/pki/tls/certs/cloudera_manager-ca.crt or /etc/ssl/certs/cloudera_manager-ca.crt

server_cert_file

The file holding the PEM public key of the Cloudera Manager server. Default: /etc/pki/tls/certs/${::fqdn}-cloudera_manager.crt or /etc/ssl/certs/${::fqdn}-cloudera_manager.crt

server_key_file

The file holding the PEM private key of the Cloudera Manager server. Default: /etc/pki/tls/private/${::fqdn}-cloudera_manager.key or /etc/ssl/private/${::fqdn}-cloudera_manager.key

server_chain_file

The file holding the PEM public key(s) of the Cloudera Manager server intermediary certificate authority. Default: none

server_keypw

The password used to protect the keystore. Default: none

proxy

The URL to the proxy server for the YUM repositories. Default: absent

proxy_username

The username for the YUM proxy. Default: absent

proxy_password

The password for the YUM proxy. Default: absent

parcel_dir

The directory where parcels are downloaded and distributed. Default: /opt/cloudera/parcels

Limitations

OS Support:

Cloudera official supported operating systems for CM4 and supported operating systems for CM5.

  • RedHat family - tested on CentOS 5.9, CentOS 6.4
  • SuSE family - tested on SLES 11SP3
  • Debian family - tested on Debian 6.0.7, Debian 7.0, Ubuntu 10.04.4 LTS, and Ubuntu 12.04.2 LTS

Software Support:

  • Cloudera Manager - tested with 4.1.2, 4.8.0, and 5.0.0beta2
  • CDH - tested with 4.1.2 and 4.5.0, 5.0.0beta2
  • Cloudera Impala - tested with 1.0 and 1.2.3
  • Cloudera Search - tested with 1.1.0
  • Cloudera GPL Extras - tested with 4.3.0 and 5.0.0

Notes:

  • Supports Top Scope variables (i.e. via Dashboard) and Parameterized Classes.
  • Based on the Cloudera Manager 5.0.0 Beta 2 Installation Guide
  • TLS certificates must be in PEM format and are not deployed by this module.
  • When using parcels, the CDH software is not deployed by Puppet. Puppet will only install the Cloudera Manager server/agent. You must then configure Cloudera Manager to deploy the parcels.
  • When installing packages and not parcels on SLES, SP2 is required as the hadoop-2.0.0+1518-1.cdh4.5.0.p0.24.sles11.x86_64 package requires netcat-openbsd which is not available on SLES 11SP1.
  • Osfamily RedHat 5 requires the EPEL YUM repository when installing LZO support.
  • This module does not support upgrading from CDH4 to CDH5 packages, including Impala, Search, and GPL Extras.

Issues:

  • Need external module support for the Oracle Instant Client JDBC.
  • When using an external PostgreSQL server that is on the same host as the CM server, PostgreSQL must be configured to accept connections with md5 password authentication.
  • Osfamily RedHat 5 requires Python 2.6 from the EPEL YUM repository when installing the Hue service.

TODO:

See TODO.md for more items.

Development

Please see CONTRIBUTING.md for information on how to contribute.

Copyright (C) 2013 Mike Arnold mike@razorsedge.org

Licensed under the Apache License, Version 2.0.

razorsedge/puppet-cloudera on GitHub

razorsedge/cloudera on Puppet Forge