/knox-cloudera

This repository contains a Parcel and a CSD to use Apache Knox from within the Cloudera Manager

Primary LanguageShellApache License 2.0Apache-2.0

Apache Knox Parcel & CSD

This repository contains an Ansible playbook to create a Parcel for Apache Knox as well as the definitions needed to build a CSD. Both these things are needed if you want to integrate Knox into Cloudera Manager.

Parcel

To build the Parcel execute the parcel.yml playbook from the main directory using Ansible. There is a script called build-parcel.sh that can be used to kick off the process. The finished parcel will be in the work/output directory. Cloudera Manager requires one Parcel per supported target distribution. We only create a single physical file (with the suffix -universal) and symlink the others.

Note: This does not include the call to make_manifest.py which is needed to create the manifest.json file.

You can use this Parcel and drop it in the /opt/cloudera/parcel-cache directory. For RedHat/CentOS you need to change the suffix to -el7. For a full list of suffixes check the cm-ext Github wiki. If you go this route you also need to create a *.sha file containing the SHA checksum. This can be calculated using sha1sum.

We (OpenCore) do not currently host a repository with pre-built Parcels but we might do so in the future.

CSD

There is also a CSD to be able to manage Knox from within Cloudera Manager. It will start Knox but it will run with its default configuration.

To build the CSD JAR file run the build.sh script in the knox-csd directory. Copy the resulting JAR file to /opt/cloudera/csd and restart Cloudera Manager. You also need to restart the Cloudera Management Service once!

To configure Knox you’ll need to manually change the topologies etc. in the data directory itself. The Knox User Guide can help.

You will also need to manually create Proxy Users in the various services that Knox should access.

There are some problems with how Knox processes environment variables and Java system properties, some of which I’ve outlined on the Knox dev mailing list.

Implementation notes

Changing the path where Knox looks for gateway-log4j.properties

Knox uses the PropertyConfigurator to initialize Log4J using the Java system property log4j.configuration. If that is not defined it uses a default from its embedded gateway.cfg file which points to ${GATEWAY_HOME}/conf/${launcher.name}-log4j.properties. For this CSD we’ve modified the gateway.sh script to take an extra environment variable called GATEWAY_LOG_OPTS which we then point at the right location.

Changing the path where Knox looks for the gateway-site.xml file

Knox looks for gateway-site.xml in these locations in this order:

  1. Java System Property GATEWAY_HOME/conf/gateway-site.xml

  2. Environment variable GATEWAY_HOME/conf/gateway-site.xml

  3. Java System Property user.dir/conf/gateway-site.xml

  4. Classpath conf/gateway-site.xml

The path part conf is hardcoded.

Unfortunately the launcher has an embedded cfg file that contains a hardcoded GATEWAY_HOME property which the Invoker class then propagates to a Java System property. The only way to have the environment variable take effect is by removing the default conf/gateway-site.xml file from the Knox distribution. The directory conf needs to stay though because gateway.sh checks for its existence. The path is hardcoded in the script and cannot be changed even though its pointing to the wrong location. Solution is to create an empty conf directory or to patch the gateway.sh file. This parcel does the former.