IQD-divertor is a crawler that crawls the Impala query information from Cloudera Manager. It will store the scrawled information to a prepared Impala table, then provide to relevant analyst for data analysis.
- JDK 7
- Maven( maven>=3.0.0 )
- Cloudera Manager( cdh>=5.7.2 )
- Cloudera Impala( >=impala-2.5.0+cdh5.7.2 )
Important: Testing OK on CDH 5.7.2 and 5.12.1, other versions are not guaranteed to be available.
- Clone source code and export environment variable.
git clone https://github.com/gridsum/IQD-divertor.git
cd IQD-divertor
echo "export IQD_DIVERTOR_HOME=\`pwd\`" >> ~/.bashrc
- Create the Impala table in your target database with the sql in the file.
-
Edit the IQD-divertor configuration, recommended to modify the configuration items according to the annotations.
-
Import the Hadoop configurations core-site.xml and hdfs-site.xml to project resources dir.
-
Package the project with Maven.
mvn clean package
- Start daemon:
./bin/deamon.sh start
- Stop daemon:
./bin/deamon.sh stop
The principle is mainly as follows:
- Firstly, crawl the Impala query information with Cloudera Manager API.
- Secondly, extract the related fields and write to parquet file.
- Thirdly, upload the Parquet file to HDFS.
- Finally, load the Parquet formated data to target Impala table.
See fields.md for details about fields.
impala-toolbox-help@gridsum.com
IQD-divertor is licensed under the Apache License 2.0.