/dremio-jupyter-connection

Example implementation of the ODBC driver connection for Dremio and Jupyter Notebook

Primary LanguageJupyter Notebook

Plug DREMIO Data Lake Driver into Jupyter Notebooks

Standalone Container

Setup

import pandas as pd 
import pyodbc
import credentials # separate file with user credentials

Pyodbc settings

host = 'localhost'
port = 31010
uid = credentials.user
pwd = credentials.password
driver = '/opt/dremio-odbc/lib64/libdrillodbc_sb64.so' # ubuntu/debian default odbc driver
cnxn = pyodbc.connect("Driver={};ConnectionType=Direct;HOST={};PORT={};AuthenticationType=Plain;UID={};PWD={};".format(driver, host, port, uid, pwd), autocommit=True)

Read dataframe based on SQL Query

sql = 'SELECT * from "test"."weather" Limit 10'
df = pd.read_sql(sql, cnxn)

Output

df.head()
STATION NAME LATITUDE LONGITUDE ELEVATION DATE PRCP SNOW SNWD TAVG TMAX TMIN
0 USW00023272 SAN FRANCISCO DOWNTOWN, CA US 37.7705 -122.4269 45.7 2018-01-01 0.00 61 48
1 USW00023272 SAN FRANCISCO DOWNTOWN, CA US 37.7705 -122.4269 45.7 2018-01-02 0.00 61 52
2 USW00023272 SAN FRANCISCO DOWNTOWN, CA US 37.7705 -122.4269 45.7 2018-01-03 0.09 58 53
3 USW00023272 SAN FRANCISCO DOWNTOWN, CA US 37.7705 -122.4269 45.7 2018-01-04 0.06 63 53
4 USW00023272 SAN FRANCISCO DOWNTOWN, CA US 37.7705 -122.4269 45.7 2018-01-05 0.26 61 52

Requirements

References

DREMIO - The Data Lake Engine docs.