/det

Data engineering toolkit

Primary LanguagePythonApache License 2.0Apache-2.0

https://travis-ci.org/jpoullet2000/det.svg?branch=master

Data Engineering Toolkit

Based on a REST API (Swagger generated server), it provides different services such as the creation of HDFS file/folders and their underlying purge/archiving, the access of cluster services and their parameters, etc. The changes in HDFS are automatically recorded in Atlas.

https://user-images.githubusercontent.com/684574/40125807-188aeada-592c-11e8-9e7c-f97609c5648b.png

Overview

This server was generated by the swagger-codegen project. By using the OpenAPI-Spec from a remote server, you can easily generate a server stub. This is an example of building a swagger-enabled Flask server.

This example uses the Connexion library on top of Flask.

Requirements

Python 3.5+

Usage

Clone the repository:

git clone https://github.com/jpoullet2000/det.git

Modify the det/settings.py file according to your settings. If you want to set up your own custom configuration file/module, you need to create the environment variable DET_CONFIG:

export DET_CONFIG=/path/to/your/settings.py

Make sure to set HDFS_USER = None if you don't want to impersonate as hdfs when writing on HDFS.

You also need to set up your credentials:

chmod +x create_credentials.sh
./create_credentials.sh

And modify the file ~/.credentials.json based on your current settings. So far, this file cannot be named differently, due to some inherent limitation in the credentials package. The 'TEST_FLAG' item should be switched to false to secure the app based on the API tokens.

To run the server, please execute the following from the root directory:

pip install -r requirements.txt
python setup.py install
det runserver -p 9999

and open your browser to here:

http://localhost:9999/detapi/0.0.3/ui/

Your Swagger definition lives here:

http://localhost:9999/detapi/0.0.3/swagger.json

Note that if your WebHDFS service is kerberized, you also need to install the requests_kerberos module:

pip install requests_kerberos

Make sure to use a version of pykerberos >= 1.2.1. It has been noted that the version pykerberos == 1.1.14 was not properly working with DET. Note that other dependencies such as krb5 are also needed. Therefore if you are in a conda environment, it is strongly recommended to use the command:

conda install pykerberos>=1.2.1 requests-kerberos

To activate the use of Kerberos in det/settings.py, set KERBEROS_ACTIVE = True.

To launch the integration tests, use tox:

sudo pip install tox
tox

You can choose to run the webserver on a different port:

det runserver -p <port>

Note that the port 8888 is often taken by other apps, especially If your are running the det server on a machine with Hortonworks or Cloudera distro.

For a detailed usage description of the command:

det runserver --help

Running with Docker

First add the .credentials.json to the root directory (make sure you are in the root directory):

cp ~/.credentials.json .

To run the server on a Docker container, please execute the following from the root directory:

# building the image
docker build -t det .

# starting up a container
docker run -p 9999:9999 det

If you are running HDP locally, it is recommended to replace the last command by:

docker run --network host -p 9999:9999 det

Development

The API is based on the swagger/swagger.yaml file. Code is generated with the swagger code generator. To run the code generator:

java -jar <path_to_swagger-codegen-cli.jar_dir>/swagger-codegen-cli.jar generate -i det/swagger/swagger.yaml -l python-flask -o <output_dir> -c det/swagger/python_codegen_config.json