PowerBI image

cloudera logo

Updated on 2021-09-27

😄 😀 😪 😌 😕 😮 😲 👍

powerbi_odbc_cloudera_impala

Hi, in this repository, 
I am going to show you how to connect PowerBI and Cloudera Datalake Hue Impala.

There are two options.
(1) ODBC-based Method
(2) Impala Method

Before we start, please go to Cloudera Websites to download ODBC Impala connectors.

(HIVE is more robust for larger query but HIVE is slower than Impala.
https://www.cloudera.com/downloads/connectors/impala/odbc/2-6-0.html
https://www.cloudera.com/downloads/connectors/hive/odbc/2-6-1.html

Before you proceed, you might want to clear existing permissions.

This is because if you have previously attempted to establish connection and failed, you need to delete your previous configuration

Excel logo

If you see an error message as shown above, you can clear data source settings by following the steps below

Excel logo

Excel logo

Now let us return to the main topic

Option 1. ODBC-based Method

ODBC-based method has embedded impala connection

You can establish connection as shown below 👍

ODBC impala

Please enable SSL Certifcate as shown below.

SSL

+ Now we will connect PowerBI to ODBC

Choose ODBC as your data source

ODBC_step_1

ODBC_step_2

ODBC_step_3

Transform your data and load the data

ODBC_step_3

- Before you load the data, make sure you only choose required columns. 
Otherwise, it might take forever to load data
- Still, it is going to take some time to load your data depending on the sources
- (Optional) Write custom SQL query to aggregate data  (or try incremental load)

Once you have loaded the data, you can proceed to create visuals

Option 2. Impala Method

Choose Impala as your data source

Impala

Input the url address of Impala server

you might want to ensure that the url does not contain leading "https://"

Impala

Use Windows Credential

Impala

Once you have loaded the data, you can proceed to create visuals

- Before you load the data, make sure you only choose required columns. 
Otherwise, it might take forever to load data
- Still, it is going to take some time to load your data depending on the sources
- (Optional) Write custom SQL query to aggregate data  (or try incremental load)

image

Next, when you publish your work onto PowerBI Service, you will have to set up configurations correctly to ensure data is refreshed correctly

BUT becuz I am tired... I will update this part later on...email me at dkim19@its.jnj.com if you need help

The END