DEK! is a tool for managing versions and configurations of multiple clients like HDFS, HBase, MongoDB… on Windows and most Unix based systems in order to easily connect to multiple clusters. It provides a convenient Command Line Interface (CLI) and API for installing, switching, removing and listing clusters and their associated clients.
Formerly known as HHSL, the Hadoop Hyper Shell, DEK! was inspired by the very useful RVM and rbenv tools, used at large by the Ruby community. It has been developed by Hadoop sysadmins and developers, dealing with multiple, or even one, simple datasources or clusters, who don’t want anymore to do the boring stuff of configuring all the clients for each environment.
[brain] |
By Data Engineers, for Data Engineers Making life easier. No more trawling download pages, extracting archives, messing with $HOME/$PATH/$HADOOP_HOME/… environment variables. DEK! allows you to easily connect to multiple Hadoop clusters by downloading the rights clients, settings all the environment variables and preparing the -defaults.xml/-site.xml files. |
[plug] |
Extensible Install data clients such as HDFS, HBase, Spark, Zookeeper, Oozie, Sqoop, Hive and many others also supported. |
[figther jet] |
Lightweight Written in Java and only requires a JRE to be present on your system. |
[linux] |
Multi-platform Runs on Windows and any UNIX based platforms (Mac OSX, Linux, Cygwin, Solaris and FreeBSD) |
-
Download the tarball and just unzip it.
-
Optional: change your PATH environment variable
DEK uses a secure storage in order to manage secrets and credentials for registries and resolvers. Most DEK resolvers require credentials to interact with a third-party service managing their configuration: Ambari for Hortonworks, Cloudera Manager for Cloudera… DEK has a simple mechanism to protect secrets: it use by default an implementation of a secure storage based on a H2 encrypted database (see plugins/local-store). Therefor, if you want to use DEK, you need to specify a key to unlock the store. The first you use DEK, you will be prompted to enter the root password.
With DEK, you can unlock the store in 4 ways with the
options:
- direct password
- prompt, if you enter --unlock
- file, if you enter an option starting with "file://…"
- default secret file, if a file named 'secret' exists in the root path of DEK@prompt
The preferred way to unlock the secured store is @prompt, which asks the user to enter the root password:
dek ... --unlock @prompt Root password:>
For background process or automatic filling of the root password, you can use the file (or default secret file) or direct password options:
dek ... --unlock <some password> dek ... --unlock file:///...
Be aware that your commands will be store in the history of your shell if you use direct password… The file option and default secret file option allow you to use a file containing the root password. Ensure to protect this file with some restrictions (chown+chmod 700)
> dek init password: **** retype password: **** Do you want to store the password in ~/.dek/secret?[Y/n]
By initiating the database, a local registry is defined. A registry holds the references to clusters. DEK can refer multiple registries. By default, a local registry is defined, and for each commands using a registry, that local registry is used.
Display the info:
> dek info ----------------------------------------------------------------------------- DEK Version: 0.3-SNAPSHOT Commit: 181c0a9e0415efe0fd01b3cc088f9e97ed78845a () Build: 2018-11-15T07:29:44+0000 Terminal - size: Size[cols=77, rows=30] - class: PosixSysTerminal - type: xterm-256color - attributes: Attributes[lflags: echoke echoe echok echo echoctl isig icanon, iflags: brkint icrnl ixon ixany imaxbel iutf8, oflags: opost onlcr, cflags: cs6 cs7 cs8 cread hupcl, cchars: eof=^D eol=<undef> eol2=<undef> erase=^? werase=^W kill=^U reprint=^R intr=^C quit=^\ susp=^Z dsusp=<undef> start=^Q stop=^S lnext=^V discard=^O min=1 time=0 status=<undef>] -----------------------------------------------------------------------------
You can then add a cluster. For instance, we add a cluster managed by Ambari, available at http://ambari:8080 and named PROD-HDP3.0:
dek add cluster local ambari PROD-HDP3.0 http://ambari:8080
> user:
> password:
#or
dek add cluster local hadoop-unit LOCAL-HU file:///home/devil/tools/hadoop-unit-standalone-2.9
#or
dek add cluster local cloudera PROD-CDH http://cloudera-manager:8080
> user:
> password:
Verify that the cluster has been added:
dek list cluster
┌────────┬───────────┬──────┬──────────────────┐
│Registry│Name │Type │URI │
├────────┼───────────┼──────┼──────────────────┤
│local │PROD-HDP3.0│ambari│http://ambari:8080│
└────────┴───────────┴──────┴──────────────────┘
Now use it:
dek use PROD-HDP3.0
# or dek use cluster PROD-HDP3.0
The command 'use' download the missing clients, prepare the environment variables and render the configuration files. You can check them by using the 'env' command in Linux for instance:
env
....
HADOOP_HOME=/xxxxx
SPARK_HOME=/xxxx
HBASE_HOME=/xxxxx
Also have a look on configuration files:
cat
DEK can be used to just install binaries, ie without cluster configuration or environment management:
dek install hadoop 2.7.7
DEK store all its data by default in ~/.dek (will be configurable in future releases, see Roadmap)
~/.dek
archives: contains all the binaries, compressed and uncompressed.
confs: contains all the generated configurations done when the command 'use cluster …' is called. confs is a multi level directories structured like this: registryconnection id > cluster id > service name
db: the content of the local db
Since both /archives and /confs contains generated content, these directories can be wiped without fear, their content will be regenerated on the next call to 'use cluster …'
Not currently implemented
Property | Default value | Mandatory | Description |
---|---|---|---|
binaries.check |
true |
no |
Not currently implemented
Property | Default value | Mandatory | Description |
---|---|---|---|
url.mirror.apache.enabled |
true |
yes |
|
url.mirror.apache |
yes |
||
url.dist.apache |
no |
Used when |
|
url.signature.apache |
yes if |
Used when |
Property | Default value | Mandatory | Description |
---|---|---|---|
http.socket.timeout |
30000 |
yes |
Socket timeout |
http.connect.timeout |
30000 |
yes |
Connect timeout |
http.insecure |
false |
yes |
Allow insecure SSL connections and transfers. |
DEK use external resources hosted on mirrors, like the Apache mirrors, and many others. You may need to use a proxy if your company or your private network settings requires some configuration. In DEK, you can to change these properties:
Property | Default value | Mandatory | Description |
---|---|---|---|
proxy.enabled |
false |
no |
Enabled proxy configuration |
proxy.host |
- |
yes |
|
proxy.port |
- |
yes |
|
proxy.non-proxy-hosts |
127.0.0.1 |
localhost |
no |
proxy.auth.type |
none |
yes |
|
Possible values: none, ntlm, basic |
proxy.auth.ntlm.user |
- |
yes if |
proxy.auth.ntlm.password |
- |
yes if |
|
proxy.auth.ntlm.domain |
- |
yes if |
|
proxy.auth.basic.user |
- |
yes if |
|
proxy.auth.basic.password |
- |
yes if |
On the root project, just run:
mvn clean package
At the end, you should the final archive in cli/target/cli-X.X.X.tar.gz
mvn --batch-mode release:clean release:prepare -DignoreSnapshots=true -Dtag=v0.1 -DreleaseVersion=0.1 -DdevelopmentVersion=0.2-SNAPSHOT mvn release:perform
Or, skipping the tests
mvn --batch-mode release:clean release:prepare -DignoreSnapshots=true -Dtag=v0.1 -DreleaseVersion=0.1 -DdevelopmentVersion=0.2-SNAPSHOT -DskipTests -DskipITs -Dmaven.javadoc.skip=true -Darguments="-Dmaven.javadoc.skip=true -DskipTests -DskipITs"
Force update the version:
mvn --batch-mode release:update-versions -DautoVersionSubmodules=true -DdevelopmentVersion=0.4-SNAPSHOT
See Proxy
Example for proxies without authentication
dek set env proxy.enabled true dek set env proxy.host leproxy.intern dek set env proxy.port 8888
Example for proxies requiring basic authentication
dek set env proxy.enabled true dek set env proxy.host leproxy.intern dek set env proxy.port 8888 dek set env proxy.auth.type basic dek set env proxy.auth.basic.user myuser dek set env proxy.auth.basic.password lepassword
Example for proxies requiring NTLM authentication
dek set env proxy.enabled true dek set env proxy.host leproxy.intern dek set env proxy.port 8888 dek set env proxy.auth.type ntlm dek set env proxy.auth.ntlm.user myuser dek set env proxy.auth.ntlm.password lepassword dek set env proxy.auth.ntlm.domain INTERN
DEK manages some services (HDFS, HBASE and many others). If your service is not yet managed by DEK, just create an implementation of fr.layer4.dek.binaries.ClientPreparer. See fr.layer4.dek.binaries.HdfsClientPreparer for an example.
-
Allow to use a custom path for DEK instead of the default ~/.dek via the configuration
-
Allow to use private binaries repositories instead of default Apache mirrors
-
Allow to skip integrity of files (specially in case of private repos)
-
Add option to skip winutils for Hadoop
-
Add security (Kerberos) switch
-
Manage other clients (Cassandra, MongoDB…)
-
Add MapR
-
Add a Offline Mode
-
Force a version of a client for a cluster
Contributions are welcome! To submit a pull request you should fork the project repository, and make your change on a feature branch of your fork.
Copyright © 2012-2018 Vincent Devillers, and the individual contributors to DEK. Use of this software is granted under the terms of the MIT License.
See the LICENSE for the full license text.
Update the content of the file THIRD-PARTY.txt:
mvn org.codehaus.mojo:license-maven-plugin:aggregate-add-third-party@aggregate-add-third-party