/assimilate-databricks

A repo to assimilate databricks

Primary LanguageJupyter NotebookCreative Commons Zero v1.0 UniversalCC0-1.0

assimilate-databricks

A repo to assimilate databricks

API Getting Started

databricks-api

Setup auth

databricks-python

Place in Codespace secrets

DATABRICKS_HOST
DATABRICKS_TOKEN

Test out CLI

databricks clusters list --output JSON | jq
databricks fs ls dbfs:/
databricks jobs list --output JSON | jq

Remote connect

databricks-connect

Databricks SQL Connector

Setup table first!

sql remote https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#connection-details-cluster

Comparing to Dask

An alternative solution to Databricks is https://tutorial.dask.org/00_overview.html[Dask] or Ray.

Distributed compute

Hands on Enron

  • Download data from Kaggle and upload by right-click on explorer in GitHub Codespaces
  • place in a "datasets" directory and add this directory to your .gitignore. This ensures you don't check in a 1GB file to GitHub.

Streamlit Example

Enable enron...

streamlit hello --server.enableCORS=false streamlit run hello_streamlit_enron.py --server.enableCORS=false