/IQD-divertor

IQD-divertor(Impala Query Details Divertor) is used to collect impala query details from Cloudera Manager

Primary LanguageJavaApache License 2.0Apache-2.0

1. Introduction

IQD-divertor is a crawler that crawls the Impala query information from Cloudera Manager. It will store the scrawled information to a prepared Impala table, then provide to relevant analyst for data analysis.

2. Installation

2.1. Dependencies

  • JDK 7
  • Maven( maven>=3.0.0 )
  • Cloudera Manager( cdh>=5.7.2 )
  • Cloudera Impala( >=impala-2.5.0+cdh5.7.2 )

Important: Testing OK on CDH 5.7.2 and 5.12.1, other versions are not guaranteed to be available.

2.2. Install

  • Clone source code and export environment variable.
git clone https://github.com/gridsum/IQD-divertor.git  
cd IQD-divertor  
echo "export IQD_DIVERTOR_HOME=\`pwd\`" >> ~/.bashrc  
  • Create the Impala table in your target database with the sql in the file.

2.3. Configure & Package

  • Edit the IQD-divertor configuration, recommended to modify the configuration items according to the annotations.

  • Import the Hadoop configurations core-site.xml and hdfs-site.xml to project resources dir.

  • Package the project with Maven.

mvn clean package

2.4. Start/Stop daemon

  • Start daemon:
./bin/deamon.sh start
  • Stop daemon:
./bin/deamon.sh stop

3. Tutorials & Documentation

3.1. Principle

The principle is mainly as follows:

  • Firstly, crawl the Impala query information with Cloudera Manager API.
  • Secondly, extract the related fields and write to parquet file.
  • Thirdly, upload the Parquet file to HDFS.
  • Finally, load the Parquet formated data to target Impala table.

3.2. Fields in impala table

See fields.md for details about fields.

4. Communication

impala-toolbox-help@gridsum.com

5. License

IQD-divertor is licensed under the Apache License 2.0.